Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1...

30
Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical identifiability 10.3 Dealing with collinear regressors during multivariate regression 10.3.1 Problematic issues 10.3.2 Principle component analysis and regression 10.3.3 Ridge regression 10.3.4 Chiller case study analysis involving collinear regressors 10.3.5 Stagewise regression 10.3.6 Case study of stagewise regression involving building energy loads 10.3.7 Other methods 10.4 Non-OLS parameter estimation methods 10.4.1 General overview 10.4.2 Error in variable (EIV) and corrected least squares 10.4.3 Maximum likelihood estimation (MLE) 10.4.4 Logistic functions 10.5 Non-linear estimation 10.5.1 Models transformable to linear in the parameters 10.5.2 Intrinsically non-linear models 10.6 Computer intensive methods 10.6.1 Robust regression 10.6.2 Bootstrap sampling 1 Chap 10-Data Analysis Book- Reddy

Transcript of Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1...

Page 1: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 1

Chapter 10: Parameter Estimation Methods10.1Background 10.2 Concept of estimability

10.2.1 Ill-conditioning 10.2.2 Structural identifiability10.2.3 Numerical identifiability

10.3 Dealing with collinear regressors during multivariate regression10.3.1 Problematic issues10.3.2 Principle component analysis and regression10.3.3 Ridge regression10.3.4 Chiller case study analysis involving collinear regressors 10.3.5 Stagewise regression10.3.6 Case study of stagewise regression involving building energy loads10.3.7 Other methods

10.4 Non-OLS parameter estimation methods10.4.1 General overview 10.4.2 Error in variable (EIV) and corrected least squares10.4.3 Maximum likelihood estimation (MLE)10.4.4 Logistic functions

10.5 Non-linear estimation10.5.1 Models transformable to linear in the parameters10.5.2 Intrinsically non-linear models

10.6 Computer intensive methods10.6.1 Robust regression 10.6.2 Bootstrap sampling

Page 2: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 2

10.3 Dealing with Collinear Regressors

• Strong collinearity has the result that the variables are “essentially” influencing or explaining the same system behavior.

• For linear models, the Pearson correlation coefficient provides the necessary indication of the strength of this overlap.

• This issue of collinearity between regressors is a very common phenomenon which has important implications during model building and parameter estimation.

• Not only can regression coefficients be strongly biased, but they can even have the wrong sign.

Page 3: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 3

Example 10.3.1. Consider the simple example of a linear model with two regressors both of which are positively correlated with the response variable y. The data consists of six samples. The pairwise plots shown clearly depict the fairly strong relationship between the two regressors. Table 10.1 Data table y x1 x2

2 1 2 2 2 3 3 2 1 3 5 5 5 4 6 6 5 4

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6

Varia

ble

y

Variable x1

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6

Varia

ble

x1

Variable x2

Fig. 10.4 Data for Example 10.3.1 to illustrate how multicollinearity in the

regressors could result in model coefficients with wrong signs.

Page 4: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 4

From the correlation matrix C for this data, the correlation coefficient between the two regressors is 0.776, which can be considered to be of moderate strength.

Table 10.2 Correlation matrix x1 x2 Y x1 1.000 0.776 0.742 x2 1.000 0.553 y 1.000 An OLS regression results in the following model:

1 21.30 0.75 0.05y x x 10.14 The model identified suggests a negative correlation between y and x2 which is contrary to both the correlation coefficient matrix and the graphical trend in Fig. 10.4. - This is due the high inter-correlation between the regressor variables. - The inverse of the variance-covariance matrix (X’X) of the estimated regression coefficients has become ill-conditioned and unstable. -A simple layperson explanation is that x1 has usurped more than its appropriate share of explicative power at the detriment of x2 which, then had to correct itself to such a degree that it ended up assuming a negative correlation.

Page 5: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 5

Mullet (1976), discussing why regression coefficients in the physical sciences often have wrong signs, quotes:

(i) Marquardt who postulated that multicollinearity is likely to be a problem only when correlation coefficients among regressor variables is higher than 0.95, and

(ii) Snee who used 0.9 as the cut-off point. Draper and Smith (1981) state that multicollinearity is

likely to be a problem if the simple correlation between two variables is larger than the correlation of one or either variable with the dependent variable. Significant collinearity between regressor variables is likely to lead to two different problems:

(i) though the model may provide a good fit to the current data, its usefulness as a reliable predictive model is suspect. The regression coefficients and the model predictions tend to have large standard errors and uncertainty bands which makes the model unstable. It is imperative that a sample cross-validation evaluation be performed to identify a suitable model

(ii) the regression coefficients in the model are no longer proper indicators of the relative physical importance of the regressor parameters.

Page 6: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 6

10.3.2 Principle component analysis and regression

Principle Component Analysis (PCA) is one of the best known multivariate methods for removing the adverse effects of collinearity

It has a simple intuitive appeal, and though very useful in certain disciplines (such as the social sciences), its use has been rather limited in engineering applications.

It is not a statistical method leading to a decision on a hypothesis, but a general method of identifying which parameters are collinear and reducing the dimension of multivariate data.

This reduction in dimensionality is sometimes useful for gaining insights into the behavior of the data set.

It also allows for more robust model building

Page 7: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 7

The variance in the collinear multi-dimension data comprising of the regressor variable vector X can be reframed in terms of a set of orthogonal (or uncorrelated) transformed variable vector U.

This vector will then provide a means of retaining only a subset of variables which explain most of the variability in the data.

Thus, the dimension of the data will be reduced without losing much of the information (reflected by the variability in the data) contained in the original data set

Original Variable x1

OriginalVariable x2

Rotated Variable u1

Rotated Variable u2

Fig. 10.5 Geometric interpretation of what a PCA analysis does in terms of variable transform for the case of two variables.

Page 8: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 8

• Usually PCA analysis is done with standardized variables Z instead of the original variables X such that variables Z have zero mean and unit variance.

• The real power of this method is when one has a large number of dimensions; in such cases one needs to have some mathematical means of ascertaining the degree of variation in the multi-variate data along different dimensions.

• This is achieved by looking at the eigenvalues. The eigenvalue can be viewed as one which is indicative of the length of the axis while the eigenvector specifies the direction of rotation.

Page 9: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 9

Recall that the eigenvalues (also called characteristic roots or latent roots) and the eigenvector A of a matrix Z are defined by: AZ = λZ 10.15 The eigenvalues are the solutions of the determinant of the covariance matrix of Z: | | 0Z'Z - λI 10.16 Because the original data or regressor set X is standardized, an important property of the eigenvalues is that their sum is equal to the trace of the correlation matrix C, i.e., 1 2 ... p p 10.17 where p is the dimension or number of variables. This follows from the fact that the diagonal elements for a correlation matrix should sum to unity.

Page 10: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 10

Usually, the eigenvalues are ranked such that the first has the largest numerical value, the second the second largest, and so on. The corresponding eigenvector represents the coefficients of the principle components (PCs). Thus, the linearized transformation for the PC from the original vector of standardized variables Z can be represented by:

PC1: 1 11 1 12 2 1... p pu a z a z a z s.t 2 2 211 12 1... 1pa a a

PC2: 2 21 1 22 2 2... p pu a z a z a z s.t 2 2 221 22 2... 1pa a a

….. where aii are called the component weights and are the scaled elements of the corresponding eigenvector. Thus, the correlation matrix for the standardized and rotated variables is now transformed into:

1 0 0

0 0

0 0 p

C where 1 2 ... p 10.19

Note that the off-diagonal terms are zero because the variable vector U is orthogonal. Further, note that the eigenvalues represent the variability of the data along the principle components.

Page 11: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 11

If one keeps all the PCs, nothing is really gained in terms of reduction in dimensionality, even though they are orthogonal (i.e., uncorrelated), and the model building by regression will be more robust.

Model reduction is done by rejecting those transformed variables U which contribute little or no variance.

Since the eigenvalues are ranked, PC1 explains the most variability in the original data while each succeeding eigenvalue accounts for increasingly less.

A typical rule of thumb to determine the cut-off is to drop any factor which explains less than (1/p) of the variability, where p is the number of parameters or the original dimension of the regressor data set.

Page 12: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 12

Example 10.3.2 Consider the table below with PC rotation % of Total Variance Accounted for

Eigenvalues

Extracted Factors

Incremental Cumulative Incremental Cumulative

PC1 41% 41% 3.69 3.69PC2 23 64 2.07 5.76PC3 14 78 1.26 7.02PC4 7 85 0.63 7.65PC5 5 90 0.45 8.10PC6 4 94 0.36 8.46PC7 3 97 0.27 8.73PC8 2 99 0.18 8.91PC9 1 100 0.09 9.00

-PC1 explains 41% of the variation, PC2 23%, and so on till all nine PC explain as much variation as was present in the original data. -Had the nine PC been independent or orthogonal, each one would have explained on an average (1/p)= (1/9)=11% of the variance. The eigenvalues listed in the table corresponds to the number of variables which would have explained an equivalent amount of variation in the data that is attributed to the corresponding PC. For example, the first eigenvalue is determined as: 41/(100/9)=3.69 i.e., PC1 has the explicative power of 3.69 of the original variables, and so on.

Page 13: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 13

Example 10.3.3. Reduction in dimensionality using PCA for actual chiller data Consider data assembled in Table 10.4 which consists of a data set of 15 possible variables or characteristic features (CFs) under 27 different operating conditions of a centrifugal chiller.

CF1 CF2 CF3 CF4 CF5 CF6 CF7 CF8 CF9 CF10 CF11 CF12 CF13 CF14 CF153.765 5.529 5.254 3.244 15.078 4.911 2.319 5.473 83.069 39.781 0.707 0.692 1.090 5.332 0.7063.405 3.489 3.339 3.344 19.233 3.778 1.822 4.550 73.843 32.534 0.603 0.585 0.720 4.977 0.6842.425 1.809 1.832 3.500 31.333 2.611 1.009 3.870 73.652 21.867 0.422 0.397 0.392 3.835 0.6324.512 6.240 5.952 2.844 12.378 5.800 3.376 5.131 71.025 45.335 0.750 0.735 1.260 6.435 0.7013.947 3.530 3.338 3.322 18.756 3.567 1.914 4.598 71.096 32.443 0.568 0.550 0.779 5.846 0.6752.434 1.511 1.558 3.633 35.533 1.967 0.873 3.821 72.116 18.966 0.335 0.311 0.362 3.984 0.6114.748 5.087 4.733 3.156 14.478 4.589 2.752 5.060 70.186 39.616 0.665 0.649 1.107 6.883 0.6904.513 3.462 3.197 3.444 19.511 3.356 1.892 4.716 69.695 31.321 0.498 0.481 0.844 6.691 0.6743.503 2.153 2.053 3.789 28.522 2.244 1.272 4.389 68.169 22.781 0.347 0.326 0.569 5.409 0.6473.593 5.033 4.844 2.122 13.900 4.878 2.706 3.796 72.395 48.016 0.722 0.706 0.990 5.211 0.6893.252 3.466 3.367 2.122 17.944 3.700 1.720 3.111 76.558 42.664 0.626 0.607 0.707 4.787 0.6792.463 1.956 2.004 2.233 27.678 2.578 1.102 2.540 73.381 32.383 0.472 0.446 0.415 3.910 0.6304.274 6.108 5.818 2.056 11.944 5.422 3.323 4.072 71.002 52.262 0.751 0.739 1.235 6.098 0.7013.678 3.330 3.228 2.089 17.622 3.389 1.907 3.066 70.252 41.724 0.593 0.573 0.722 5.554 0.6622.517 1.644 1.714 2.256 30.967 2.133 1.039 2.417 69.184 29.607 0.383 0.361 0.385 4.126 0.6104.684 5.823 5.522 2.122 12.089 4.989 3.140 4.038 71.271 50.870 0.732 0.721 1.226 6.673 0.7024.641 4.002 3.714 2.456 16.144 3.589 2.188 3.829 70.354 40.714 0.591 0.574 0.928 6.728 0.6903.038 1.828 1.796 2.689 29.767 1.989 1.061 3.001 70.279 26.984 0.347 0.327 0.470 4.895 0.6213.763 5.126 4.924 1.400 12.744 4.656 2.687 2.541 73.612 62.921 0.733 0.721 1.030 5.426 0.6943.342 3.344 3.318 1.567 16.933 3.456 1.926 2.324 70.932 50.601 0.631 0.611 0.698 5.073 0.6592.526 1.940 2.053 1.378 25.944 2.600 1.108 1.519 74.649 46.588 0.476 0.453 0.421 4.122 0.6134.411 6.244 5.938 1.522 11.689 5.411 3.383 3.193 70.782 61.595 0.749 0.740 1.282 6.252 0.7054.029 3.717 3.559 1.178 14.933 3.844 2.128 1.917 69.488 61.187 0.628 0.609 0.817 6.035 0.6672.815 1.886 1.964 1.378 26.333 2.122 0.946 1.394 79.851 48.676 0.420 0.398 0.448 4.618 0.6094.785 5.528 5.203 1.611 11.756 5.100 3.052 2.948 69.998 59.904 0.717 0.704 1.190 6.888 0.6944.443 3.882 3.679 1.933 15.578 3.556 2.121 3.038 70.939 46.667 0.612 0.597 0.886 6.553 0.6783.151 2.010 2.054 1.656 25.367 2.333 1.224 1.859 69.686 41.716 0.409 0.390 0.500 5.121 0.615

Page 14: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 14

With the intention of reducing the dimensionality of the data set, a PCA is performed in order to determine an optimum set of principle components.

Component Number

Eigenvalue Percent of CumulativeVariance Percentage

1 10.6249 70.833 70.8332 2.41721 16.115 86.9473 1.34933 8.996 95.9434 0.385238 2.568 98.5115 0.150406 1.003 99.5146 0.0314106 0.209 99.7237 0.0228662 0.152 99.8768 0.00970486 0.065 99.9409 0.00580352 0.039 99.97910 0.00195306 0.013 99.99211 0.000963942 0.006 99.99812 0.000139549 0.001 99.99913 0.0000495725 0.000 100.00014 0.0000261991 0.000 100.00015 0.0000201062 0.000 100.000

Table 10.5 Eigenvalue table

Scree Plot

Component

Eig

enva

lue

0 3 6 9 12 150

2

4

6

8

10

12

Fig. 10.6 Scree plot of Table 10.5 data

Adequate to retain only3 PCs

Page 15: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 15

The equations of the principal components can be deduced from the table of components weights shown. For example, the first principal component has the equation PC1= 0.268037*CP1 + 0.215784*CP10 + 0.294009*CP11 + 0.29512*CP12 + 0.302855*CP13 + 0.247658*CP14 + 0.29098*CP15 + 0.302292*CP2 + 0.301159*CP3 - 0.06738*CP4 - 0.297709*CP5 + 0.297996*CP6 + 0.301394*CP7 + 0.123134*CP8 - 0.0168*CP9 where the values of the variables in the equation are standardized by subtracting their means and dividing by their standard deviations. Table 10.6. Component Weights for Example 10.3.3 Component 1 Component 2 Component 3 CP1 0.268037 0.126125 -0.303013 CP10 0.215784 -0.447418 -0.0319413 CP11 0.294009 -0.0867631 0.155947 CP12 0.29512 -0.0850094 0.149828 CP13 0.302855 0.0576253 -0.0135352 CP14 0.247658 0.117187 -0.388742 CP15 0.29098 0.133689 0.08111

Part ofTableshown

Page 16: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 16

Summary• If the principal components can be interpreted in physical terms, then it would have

been an even more valuable tool. Unfortunately, this is often not the case. Though it has been shown to be useful in social sciences as a way of finding effective combinations of variables, it has had limited success in the physical and engineering sciences.

• Draper and Smith (1981) caution that PCA may be of limited usefulness in physical engineering sciences contrary to social sciences where models are generally weak and numerous correlated regressors tend to be included in the model.

• Reddy and Claridge (1992) conducted synthetic experiments in an effort to evaluate the benefits of PCA against multiple linear regression (MLR) for modeling energy use in buildings

- conclusion that only when the data is poorly explained by the MLR model and when correlation strengths among regressors were high, was there a possible benefit to PCA over MLR;

- however, injudicious use of PCA may exacerbate rather than overcome problems associated with multi-collinearity.

Page 17: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 17

10.3.3 Ridge regression

This method results in more stable estimates than those of OLS in the sense that they are more robust, i.e., less affected by slight variations in the estimation data.

There are several alternative ways of defining and computing ridge estimates; the ridge trace is perhaps the most intuitive.

It is best understood in the context of a graphical representation which unifies the problems of detection and estimation. Since (X’X) is close to singular, the approach involves introducing a known amount of “noise” via a variable k, leading to the determinant becoming less sensitive to multicollinearity.

Page 18: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 18

With this approach, the parameter vector for OLS is given by: -1

Ridgeb = (X'X + k.I) X'Y 10.20 where I is the identity matrix. Parameter variance is given by

2 -1var( ) ( ' ( )Ridgeb s -1X'X + k.I) X X X'X + k.I 10.21a with prediction bands:

2^

0var( y ) { ]Ridge s ' -1 -10 01 X [(X'X + k.I) X'X(X'X + k.I) X 10.21b

where s is the standard deviation of the residuals. Ridge regression should be performed with standardized variables in order to remove large differences in the numerical values of the different regressors

OLS estimateRR estimate

True value

Central estimate

Biased estimate

Fig. 10.8 Ridge regression (RR) estimates are biased compared to OLS estimates but the variance of the parameters will (hopefully) be

smaller as shown

Page 19: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 19

One increases the values of the “jiggling factor” k from 0 (the OLS case) to 1.0 to determine the most optimum value of k which yields the least model mean square error (MSE).

This is often based on the cross-validation or testing data set (see section 5.2.3d) and not for the training data set from which the model was developed. Usually the value of k is the range 0 – 0.2.

Unfortunately, many practical problems exhibit all the classical signs of multicollinear behavior but, often, applying PCA or ridge analysis does not necessarily improve the prediction accuracy of the model over the standard multi-linear OLS regression (MLR). The case study below illustrates such an instance.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.43.6

3.8

4

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6x 10

4 Ridge regression - variance on predictions

Ridge factor K

min for K=0.131, V=3.7e4

Ridge factor k

Mea

n sq

uare

err

ors

on p

redi

ctio

ns

Fig 10.7 The optimal value of the ridge factor k is the value for which

the mean square error (MSE) of model predictions is minimum. k=0 corresponds to OLS estimation. In this case, k=0.131 is optimal with

MSE = 3.78 x 104.

Page 20: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 20

10.3.5 Stagewise regression

Another approach that offers the promise of identifying sound parameters of a linear model whose regressors are correlated is stagewise regression (Draper and Smith, 1981).

It was extensively used prior to the advent of computers.

The approach, though limited in use nowadays, is still a useful technique to know, and can provide some insights into model structure. Consider the following multivariate linear model with collinear regressors:

10.23

The basic idea is to perform a simple regression with one regressor at a time with the order in which they are selected depending on their correlation strength with the response variable. This strength is re-evaluated at each step.

0 1 1 py x ... xp

Page 21: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 21

Algorithm describing its overall methodology consists of following steps: • Compute the correlation coefficients of the response variable y against

each of the regressors, and identify the strongest one, say xi

• Perform a simple OLS regression of y vs xi , and compute the model residuals u . This becomes the new response variable.

• From the remaining regressor variables, identify the one most strongly correlated with the new response variable. If this is represented by xj , then regress u vs xj and recompute the second stage model residuals, which become the new response variable.

• Repeat this process for all remaining significant regressor variables.• The final model is found by rearranging the terms of the final expression

into the standard regression model form, i.e., with y on the left-hand side and the significant regressors on the right-hand side.

Forward stepwise multiple regression method is selects which regressor to include in the second stage depending on its correlation strength with the residuals of the model defined in the first stage.

Stepwise regression selection is based on the strength of the regressor with the response variable.

Page 22: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 22

10.3.6 Case study of stagewise regression involving building energy loads

Synthetic computer data:

1) Generate data by detailed building energy simulation program

2) Formulate macro-model for the thermal loads of an ideal one-zone building

3) Use a multistage linear regression approach to determine the model coefficients (along with their standard errors)

4) Finally translated these into estimates of the physical parameters (along with the associated errors).

The evaluation was done for two different building geometries and building mass at two different climatic locations (Dallas, TX and Minneapolis, MN) using daily average or summed data so as to remove/minimize dynamic effects.

Page 23: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 23

'B LR s l sol sol s v p 0 z v v 0 zQ q k (1 k )A a (b UA m Ac )(T T ) m Ah (w w )

10.3.10

where A Conditioned floor area of building AS Surface area of building cp Specific heat at constant pressure of air hv Heat of vaporization of water

kl Ratio of internal latent loads to total internal sensible loads of building ks Multiplicative factor for converting qLR to total internal sensible loads mv Ventilation air flow rate per unit conditioned area QB Building thermal loads qLR Monitored electricity use per unit area of lights and receptacles inside the building T0 Outdoor air dry-bulb temperature Tz Thermostat set point temperature U Overall building shell heat loss coefficient W0 Specific humidity of outdoor air Wz Specific humidity of air inside space is an indicator variable which is 1 when w0 > wz and 0 otherwise.

Page 24: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 24

(b One-step regression approach One way to identify these parameters is to directly resort to OLS multiple linear regression provided monitored data of qLR, T0 and w0 is available. For such a scheme, it is more appropriate to combine solar loads into the loss coefficient U and rewrite eq. 10.23 as: QB/A = a + b.qLR + c. .qLR + d.T0 +e. .(w0 - wz) 10.24a where the regression coefficients are: a = -(UAS/A + mv.cp).Tz b = ks c = ks.kl 10.24bb d = (UAS/A + mv.cp) e = mv.hv

Subsequently, the five physical parameters can be inferred from the regression coefficients as: ks = b kl = c/b mv = e/hv 10.25 UAS/A = d-e.cp/hv Tz = -a/d The uncertainty associated with these physical parameters can be estimated from classical propagation of errors

Page 25: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 25

(c) Two-step regression approach Earlier studies based on daily data from several buildings in central Texas indicate that for positive values of (w0 -wz ) the variables (i) qLR and T0, (ii) qLR and (w0 -wz), and (iii) T0 and (w0 -wz) are strongly correlated and are likely to introduce bias in the estimation of parameters from OLS regression. It is the last set of variables which is usually the primary cause of uncertainty in the parameter estimation process. Two-step regression involves separating the data set into two groups depending on being 0 or 1 (with wz assumed to be 0.01 kg/kg). During the two-month period under conditions of low outdoor humidity, = 0, and eq. 10.24 reduces to QB/A = a + b.qLR + d.T0 10.26 Since qLR and T0 are usually poorly correlated under such low outdoor humidity conditions, the coefficients b and d deduced from multiple linear regression are likely to be unbiased. For the remaining year-long data when when =1, eq. 10.24 can be re-written as: QB/A = a + (b + c).qLR + d.T0 +e.(w0 - wz) 10.27

Two variants possible:-Option A: Use Eq. 10.26 and determine a, b and d. Only retain value of b

Then use Eq. 10.27, and determine a, (b+c), d and e coefficients-Option B: Use Eq. 10.26, and determine a,b and d. Retain both values of b and d

Then use the following modified equation to determine a, c and e from data when =1: QB/A - d.T0 = a + (b + c).qLR +e.(w0 - wz) 10.28 The collinearity effects between qLR and (w0 -wz) when =1 are usually small and this is likely to yield less unbiased parameter estimates than variant A.

Page 26: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 26

Table 10.12 Steps involved in the multi-stage regression approach using Eq.10.24. Days when =1 correspond to days with outdoor humidity higher than that indoors

Dependent Variable Regressor

Variables Type of Regression

Parameter Identified from eq. 10.3.11

Data Set Used

Step 1 QB/A qLR 2-P b Entire data Step 2 Y1= QB/A - b.qLR T0 2-P or 4-P d “ Step 3 Y2= QB/A - b.qLR-d.T0 qLR 2-P c Data when =1 Step 4 Y2= QB/A - b.qLR -d.T0 (w0 - wz) 2-P or 4-P e “ Table 10.13 Correlation coefficient matrix of various parameters for the two cities selected at the daily time scale for Runs #6 and #7 (R6 and R7). Dallas Qb,1-zone Y1 Y2 qLR T0 W0z qLR. QB,1-zone 0.85 0.52 0.53 0.88 0.72 0.82 Y1 0.88 0.78 0.00 0.97 0.80 0.70 Y2 -0.86 -0.82 -0.27 0.59 0.68 0.44 qLR 0.48 0.01 -0.30 0.11 0.07 0.42 T0 0.91 0.97 -0.93 0.13 0.75 0.72 W0z 0.57 0.59 -0.40 0.10 0.54 0.66 qLR. 0.66 0.60 -0.48 0.27 0.58 0.72

Minneapolis

Page 27: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 27

Y2 = Y1 - d * To = Qb - b *qLR - d *To get c in this step.

Y = Qb get b in this step.

Y1 = Qb - b*qLR get d in this step.

Y2 = Y1 - d * To = Qb - b *qLR - d *To get e in this step.

( a )

( c )

( b )

( d )

QB/A

0.1

0.10

0.05

0.00

-0.05

-0.100.00 0.01 0.02 0.03 0.04 0.05

0.00 0.01 0.02 0.03 0.04 0.05

-5.0 5.0 15.0 25.0 35.0

0.00 2.0 4.0 6.0 8.0 10.0

Step 2: Y1= QB/A - b.qLRStep 1: Y= QB/A

Step 3: Y2= Y1 –d*T0 = QB/A - b.qLR-d.T0 Step 4: Y2= Y1 –d*T0 =QB/A - b.qLR -d.T0

0.10

0.05

0.00

-0.05

-0.10

-0.02

-0.04

-0.06

-0.08

-0.10

-0.02

-0.04

-0.06

-0.08

-0.10

Y1

Y2 Y2(a) Get “b” in this step (b) Get “d” in this step

(c) Get “c” in this step (d) Get “e” in this step

qLR T0

qLR (w0 - wz)

Fig. 10.12 Different steps in the stagewise regression approach to estimate the four model parameters “b,c,d,e” following Eq.10.24 as described in Table 10.12. a. Estimation of parameter b, b. Estimation of parameter d, c. Estimation of

parameter c, d. Estimation of parameter e.

Page 28: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 28

Fig. 10.11 Comparison of how the various estimation schemes (R1-R9) were able to recover the “true” values of the four physical parameters of the model given by

Eq. 10.24. The solid lines depict the correct value while the mean values estimated for the various parameters and their 95% uncertainty bands are also shown

Page 29: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 29

Page 30: Chapter 10: Parameter Estimation Methods 10.1Background 10.2 Concept of estimability 10.2.1 Ill-conditioning 10.2.2 Structural identifiability 10.2.3 Numerical.

Chap 10-Data Analysis Book-Reddy 30

7.1 Background

Optimization:

- widely studied under Operations Research

- wide applications for both design and operation

- arise in almost all branches of industry and society

Applies to decision making under low uncertainty