Linear statistical models 2008 Model diagnostics ïƒ Residual analysis ïƒ...

download Linear statistical models 2008 Model diagnostics ïƒ Residual analysis ïƒ Outliers ïƒ Dependence ïƒ Heteroscedasticity ïƒ Violations of distributional assumptions

of 17

  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Embed Size (px)

Transcript of Linear statistical models 2008 Model diagnostics ïƒ Residual analysis ïƒ...

  • Slide 1
  • Linear statistical models 2008 Model diagnostics Residual analysis Outliers Dependence Heteroscedasticity Violations of distributional assumptions Identification of influential observations Examination of over- and under-dispersion
  • Slide 2
  • Linear statistical models 2008 A simple model of water clarity Inputs: year, temperature, salinity, station dummies Output; Secchi depth (water clarity)
  • Slide 3
  • Linear statistical models 2008 Sampling sites for water quality in the Stockholm archipelago Stockholm Baltic Sea
  • Slide 4
  • Linear statistical models 2008 Raw residuals in generalized linear models The predicted values are linear combinations of the observed values, i.e.. where H is a symmetric idempotent matrix ( H = H*H ) The vector of raw residuals can be written In contrast to residuals in general linear models, the raw residuals in glims may have a variance that is strongly related to the size of
  • Slide 5
  • Linear statistical models 2008 Pearson residuals The Pearson residual is the raw residual standardized with the standard deviation of the fitted value Special cases: Poisson and binomial models
  • Slide 6
  • Linear statistical models 2008 Adjusted Pearson residuals The Pearson residual can be adjusted by computing where h ii is the i th diagonal element of the hat matrix H. The adjusted Pearson residuals can often be assumed to be approximately standard normal.
  • Slide 7
  • Linear statistical models 2008 Deviance The deviance is defined as where is the log likelihood of the full (saturated) model, and is the log likelihood of the current model at the ML-estimates of its parameters. The deviance is a sum of the contributions to the deviance from each of the observations
  • Slide 8
  • Linear statistical models 2008 Deviance residuals The (unadjusted) deviance residuals are defined as The adjusted deviance residuals are defined as where h ii is the i th diagonal element of the hat matrix H.
  • Slide 9
  • Linear statistical models 2008 Score residuals The score equations involve sums of terms U i, one for each observation. Properly standardized these terms can be regarded as residuals
  • Slide 10
  • Linear statistical models 2008 Approximate likelihood residuals Likelihood residuals may, in principle, be computed by comparing the deviance for a model based on all observations with the deviance for a model based on all but the i th observation An approximation of these residuals is given by the formula
  • Slide 11
  • Linear statistical models 2008 Choice of residuals Type of residualsTest Pearson residualsLikelihood ratio test Deviance residualsWald tests Score residualsScore tests Likelihood residuals
  • Slide 12
  • Linear statistical models 2008 Influential observations The leverage (influence) of observation i on the fitted value is the derivative of this estimate with respect to y i. Because these derivatives are given by the diagonal elements h ii of the hat matrix H.
  • Slide 13
  • Linear statistical models 2008 Cooks distance The combined change in all parameters when observation i is omitted can be computed as
  • Slide 14
  • Linear statistical models 2008 Over-dispersion Over-dispersion occurs when the variance of the response is larger than would be expected for the chosen distribution. Example: In a model involving Poisson distributions, the estimated variance is considerably larger than the estimated mean.
  • Slide 15
  • Linear statistical models 2008 Possible causes of over-dispersion Lack of homogeneity (the distribution of the target variable varies within experiments that are assumed to be replicates) Dependence (the response levels in experiments assumed to be replicates are actually positively correlated)
  • Slide 16
  • Linear statistical models 2008 Modelling over-dispersion Introduce an extra scale parameter in the variance function of the response Y. Note that the variance is a function of the mean for all members of the exponential family.
  • Slide 17
  • Linear statistical models 2008 Software recommendations General linear models MINITAB Generalized linear models SAS,proc GENMOD