Final generalized linear modeling by idrees waris iugc

download Final generalized linear modeling by idrees waris iugc

If you can't read please download the document

description

 

Transcript of Final generalized linear modeling by idrees waris iugc

  • 1. NAME :IDREES WARIS
    REG NO:3095
    SEMESTER : 4TH
    COURSE : QTIA
    COURSE FACILITATOR:SIR IMTIAZ ARIF
    1
    GENERALIZED LINEAR MODEL

2. MAIN POINTS TO BE DISCUSSED IN GZLM
2
What is GZLM or GRZ and why to use GZLM(History and Explanation)
When to use GZLM (Assumptions)
How to use GZLM in SPSS (Statistical Procedure)
3. What is Generalized linear model(GZLM)?
3
The Generalized Linear Model is a generalization of the general linear model (GLM) discussed separately with regard toAnova/AncovaandManova/Mancovamodels, as well as regression models.
GZLM allows for dependent variables with non-normal distributions and for many link functions other than identity.
GZLM supports not only traditional regression models but also logistic models for binary dependents, log-linear analysis of count data, Poisson regression for count data, gamma regression, complementary log-log models for interval-censored survival data, and many others.
4. HISTORY
4
Generalized Linear Model was first discussed by John Nelder and Robert Wedderbun in 1972 in an article.
You may find its overview in article by Gill (2001)
5. Difference between General linear model(GLM) and Generalized linear model(GZLM)
5
General linear model (GLM)
The general linear model (GLM) is a flexible statistical model that incorporates normally distributed dependent variables and categorical or continuous independent variables.
GLM enables you to accommodate designs with empty cells, more readily interpret the results using profile plots of estimated means, and customize the linear model so that it directly addresses the research questions you ask.
Anyone who regularly fits linear models, whether univariate, multivariate or repeated measures, will find the GLM procedure to be very useful.
General Equation: Y= b + bX + bX ++ bkXk +
6. GZLM Extensions:
6
Correlated or clustered data:
Generalized Estimating Equations (GEEs)
Generalized Linear mixed Models(GLMMs)
Hierarchical generalized linear models
(HGLMs)
Generalized additive models (GAMs)
7. Components of GZLM
7
There are 3 components of a generalized linear model
(or GLM):
1. Random Component identify the response variable (Y ) andspecify/assume a probability distribution for it.
2. Systematic Component specify what the explanatory or predictor variables are (e.g., X1, X2, etc). These variable enter in a linear manner + 1X1 + 2X2 + . . . + kXk
3. Link Function Specify the relationship between the mean or expected value of the random component
(i.e., E(Y )) and the systematic component.
8. Random ComponentLet N = sample size and suppose that we have Y1, Y2, . . . , YN observations on our response variable and that the observations are all independent. Y s that are discrete variables where Y is either
8
Counts (including cells of a contingency table):
Number of people who die from AIDS during a given time period.
Number of times a child tries to take a toy away from another child.
Number of times patents generated by firms.
These responses have a Poisson distribution.
Dichotomous (binary) with a fixed numbers of trials.
success/failure
correct/incorrect
agree/disagree
academic/non-academic program
These responses have a Binomial distribution.
9. Systematic Component
9
As in ordinary regression, we were modeling means. The focus is on the expected value of our response variable
E(Y ) =
We want to investigate whether and how varies as a function of the levels of our predictor or explanatory variables, Xs.
The systematic component of the model consists of a set of explanatory variables and some linear function of them.
o + 1x1 + 2x2 + 3x3 + . . . + kxk.
This linear combination of our explanatory variables is referred to as a linear predictor. This part of the model is very much like what you know with respect to ordinary linear regression
10. The Link Function
10
Left hand side of an equation/model the random component; that is,
E(Y ) =
Right hand side of the equation the systematic component; that is,
+ 1x1 + 2x2 + . . . + kxk
We now need to link the two sides.
How is = E(Y ) related to + 1x1 + 2x2 + . . . + kxk?
We do this using a Link Function =) g()
g() = + 1x1 + 2x2 + . . . + kxk
11. More about the Link Function
11
The link function provides the relationship between the linear predictor and the mean of the distribution function.
Important things about g(.):
This function g(.) is monotone as the systematic part gets larger, gets larger (or smaller).
The relationship between E(Y ) and the systematic part can be non-linear.
Some common links are:
1. Identity(ordinary regression, ANOVA, ANCOVA):
E(Y ) = + x
2. Log link which is often used when Y is nonnegative (i.e., 0Y ):
log(E(Y )) = log() = + x
This yields a loglinear model.
3. Logit link, which is often used when 01 (e.g., when response is
dichotomous/binary and were interested in a probability).
log(/(1 )) = + x
12. Link FunctionThe Canonical links
12
13. When? (ASSUMPTIONS)
13
Not assumed. GZLM/GEE, compared to GLM, do not assume a normally distributed dependent variable (or normally distributed independents), nor linearity between the predictors and the dependent, nor homogeneity of variance for the range of the dependent variable.
Linearity of the link function.
Absence of high multicollinearity
Centered data
Data distribution
Independent vs. correlated data
Data levels
Missing data
14. How to run GZLM in SPSS
14
Model Types(Already given Common model types)
Scale Response.
Linear. Specifies Normal as the distribution and Identity as the link function.
Gamma with log link. Specifies Gamma as the distribution and Log as the link function.
Ordinal Response.
Ordinal logistic. Specifies Multinomial (ordinal) as the distribution and Cumulative logit as the link function.
Ordinal probit. Specifies Multinomial (ordinal) as the distribution and Cumulative probit as the link function.
Counts.
Poisson loglinear. Specifies Poisson as the distribution and Log as the link function.
Negative binomial with log link. Specifies Negative binomial (with a value of 1 for the ancillary parameter) as the distribution and Log as the link function. To have the procedure estimate the value of the ancillary parameter, specify a custom model with Negative binomial distribution and select Estimate value in the Parameter group.
15. 15
Model Types continued
Binary Response or Events/Trials Data.
Binary logistic. Specifies Binomial as the distribution and Logit as the link function.
Binary probit. Specifies Binomial as the distribution and Probit as the link function.
Interval censored survival. Specifies Binomial as the distribution and Complementary log-log as the link function.
Mixture.
Tweedie with log link. Specifies Tweedie as the distribution and Log as the link function.
Tweedie with identity link. Specifies Tweedie as the distribution and Identity as the link function.
Custom. Specify your own combination of distribution and link function.
16. 16
Model Types (8 Custom distributions)
Normal
Inverse Gaussian
Gamma
Multinomial
Binomial
Poisson
Negative Binomial
Tweedie
17. DISTRIBUTIONS
Normal
Inverse Gaussian
17
18. DISTRIBUTIONS
Gamma
Binomial
18
19. DISTRIBUTIONS
Poisson
Negative Binomial
19
20. Distributions
20
Tweedie
Tweedie distribution requires a parameter, p, which the researcher enters to determine the shape of the distribution:
p=0: normal distribution
p=1: Poisson distribution
1< p< 2: for continuous data with exact zeros (the default in SPSS is 1.5)
p=2: gamma distribution
p>2: for positive continuous data
Multinomial
Dependent has a finite number of categories, has text string values, or is ordinal.
The distribution among categories, not shown, is arbitrary.
21. 15 custom Link functions
21
Normal, Gamma, Inverse Gaussian, Poisson and Twedie distributions:
Identity
Log
Power
22. 22
Negative binomial distributions
Negative binomial
Binomial distributions
Logit
Probit
Complementary log-log
Negative log
Log complement
Odds power
Multinomial distributions
Cumulative logit
Cumulative Probit
Cumulative Cauchit
Cumulative Complementary log
Cumulative negative log
23. Data for Analysis
23
Take data from SPSS 18.0 sample files of ships data sav.
To study the effect of
Ships type
Year of Construction &
Period of Operation on
No. of damage incidents
To run a Generalized Linear Models analysis, from the menus choose:
AnalyzeGeneralized Linear ModelsGeneralized Linear Models...
24. 24
AnalyzeGeneralized Linear ModelsGeneralized Linear Models...
Type of Model Tab(specify DV distribution and link function)
On the Response tab, select a dependent variable.
On the Predictors tab, select factors and covariates for use in predicting the dependent variable. (Factors are categorical predictors; they can be numeric or string and Covariates are scale predictors; they must be numeric)
On the Model tab, specify model effects using the selected factors and covariates.
Estimation
Statistics
EM means
Save
Export
25. 25
Type of model tab will appear:
Select Poisson log-linear as the type of model. This specifies a Poisson distribution with a log link function.
Click the Response tab:
Select Number of damage incidents as the dependent variable.
Click the Predictors tab:
Select Ship type, Year of construction, and Period of operation as factors.
Select Logarithm of aggregate months of service as the offset.
Click Options. Select Descending as the category order for factors
Click Continue
Click OK
26. 26
Click the Model tab
Select type (Ship type), construction (Year of construction), and operation (Period of operation) as main effects in the model.
Click the Estimation tab.
Select Pearson chi-square as the method for estimating the scale parameter.
Click the EM Means tab
Select type (Ship type) and construction (Year of construction) as terms to display means for and select Pairwise as the contrast for each.
Select Compute means for linear predictor as the scale.
Select Sequential Sidak as the adjustment method.
Click the Save tab.
Select Predicted value of linear predictor and Standardized deviance residual. These values are saved to the active dataset and can help you diagnose any problems with the model fit.
27. Scatter Plot
27
To produce a scatter plot of Standardized Deviance Residual by Predicted Value of the Linear Predictor, from the menus choose:
GraphsChart Builder...
Select the Scatter/Dot gallery and choose Simple Scatter.
Select Standardized Deviance Residual as the y variable and Predicted Value of the Linear Predictor as the x variable.
Click OK.
28. Research Papers and Thesis for Understanding
28
Development of an Accident Prediction Model using GLIM (Generalized Log-linear Model) and EB method: A case of Seoul (Korea)
Log-Linear Models by Noah A. Smith
Fitting Tweedies Compound Poisson Model to Insurance Claims Data: Dispersion Modeling
On the Distribution of Discounted Loss Reserves Using Generalized Linear Models by Gordon K. Smyth (December 2001)
The application of over dispersion and (GEE) Generalized Estimating Equations in repeated categorical data ( for understanding over dispersion, Poisson, negative binomial and GEE)
Clustering of foot-based pitch contours in expressive speech by Esther Klabbers and Jan P. H. van Santen
Collaborative filtering with interlaced generalized linear models by Nicolas Delannay, Michel Verleysen
DISSERTATION OF STANFORD UNIVERSITYGENERALIZED LINEAR MODELS WITH REGULARIZATION by Mee Young Park (September, 2006)