GOODNESS FIT TESTS IN - University of Toronto T-Space · GOODNESS OF FIT TESTS IN LOGISTIC...

GOODNESS OF FIT TESTS IN

LOGISTIC REGRESSION

by

David C. Hdett

A theais submitted in conformity with the requitements for the degrce of Mastes of Science

Graduate Department of Community Heaith University of Tomoto

O Copyright by David C. Hdiett, 1999

National Library l*l of Canada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services seMces bibliographiques

395 Wellington Street 395, rue Wellington OnawaON K 1 A W OitawaûN K1AON4 Canada Canada

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sel1 reproduire, prêter, distribuer ou copies of this thesis in microforni, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film, de

reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or othewise de celle-ci ne doivent être imprimés reproduced without the author' s ou autrement reproduits sans son permission. autorisation.

GOODNESS OF FIT TESTS IN LOGISTIC REGRESSION

Mas ter of Science

David C. Haiiett

Graduate Department of Community Hedth

UMrereity of Toronto, 1999

Abstract

The statktical anaiysis of dichotomous outcome variables is o b interpreted with the use of

logistic regression methods. Afta fitting a model to the observed data, one of the next essential

steps is to invesagate how weii the proposed model fits the observed data. One method which is

used to determine the suitability of the fitted logistic model is a goodness of fit test statistic.

Theoretical background, advantages and disadvantages of 6 selected goodness of fit statistics d be

examined in detail Li this thesis. The logistic regression goodness of fit tests d be examined by

pufomiing logistic mgtession on several randomly generated data sets. This thesis will attempt to

determine the different chuactcristics, snengths and weaknesses of the goodness of h statistics.

Contents

1 I N T R O D U ( = T I O N m m ~ m e e a a e m a a e e m e e e e e e e a m e e e e ~ . e m m m m m a e ~ e e m e m e m a e a e m . e e e . m m m e ~ m m e m e . e ~ . m e e . e m m e . m m m m m m m m e e e m m m e m m e m . m e m m m m m e m e m 1

1.1 RY-~IONALE ........................................................................................................................................... 1

.................................................................................................................................... 1.2 ORGANI~IXTION 4

1.3 THE SIMPLE LINUR LOGISTIC REGRESSION MODEL .................................................................... 5

.......................................................... 1.4 F r r n ~ ~ THE LINF~R LOGISTIC MODEL TO BINARY DATA 6

2 DESCRIETION OF GOODNESS OF FIT METHODS mmmm~mmmmmeo~eeaee.eeemmmemmmaeemmmmeeemammmmmeeoe8

2.1 TESTS BASED ON COVA RLUE P..~TTERNS ...........................,............................................................. 8

2.1.1 Pearson's Chi-Squued x and DeMance D ...................................................................... 8

n

2.2.1 Hosmu and Lemeshow's C: .................................................................................................. 17

2.2.2 Hosrner and Lemeshow's H: ................................................................................................. 19

1

2.3.1 Brown's Score Statistic B ............................................*............................................................. 25

a

2.3.2 Stukei's Score Test S, .............................................................................................................. 29

2.3.3 Tsiatis ............................................................................................................................................ 30

........................................................................................................... 2.4 SMOOTHED RESIDUAL TESTS 31

a

........................................................................................ 2.4.1 le Cessie and van Houweiingen's Tc 31

a

2.4.2 Hosmer and Lemeshow's T, .................................................................................. ......... 34

iii

2.6 LOGISTIC REGRESSION D U G N O ~ C S ............................................................................................ 36

............................................................................................................... 2.6.1 Pregibon's Diagnostics 36

3 SIMULATION STUDY m m m m m o m m m m m m m m . m m o m m m m m m o m o m m m m m m m m m m m o m m m o m m ~ o . m m m o m m m o o m m e m o m m m o m m m m m m m m m m m m m m m m m m m m m m m m o m m m o m m m m m 3 8

3.1 DESIGN OF THE EXPERMENT ..............................-........................................................-................. 38

..................................................................................................... 3.2 RANDOMLY GENEMTED DATA .42

3.3 CO~PUTATIONAL DETAILS ............................................................................................................... 45

3.3.1 Logistic Regession on Randomly Generated Data ............................................................... 45

3.3.2 Calculation of Pearson's X' and Deviance D ...................................................................... 46

A

3.3.3 Cakulation of Brown's Score Statistic B ................................................................................ 47

LL

3.3.4 Cdculation of Hosmer and Lemeshow's C; .......................................................................... 49

- a 3.3.5 Calculation of Hosmer and Lemeshow's Hg ......................................................................... 50

LL

3.3.6 Calcdation of le Cessie and van Houwelingen's Tc ............................................................. 51

.................................................................................. 3.4 MASTER DATA SET USED FOR COMPARISONS 52

3.5 METHODS FOR COMPAEUSON ............................................................................................................ 53

3.5.1 Proportion Rejected, P-values. Percent Agreement ............................................................... 53

3.5.2 Power ............................................................................................................................................ 55

4 SIMUUTION RESULTS m m ~ m m m m m m m m o m m o m ~ m m m m m m m a m m o m m m m m . m m m m m m e m o m o m m o o m m m m m m o m m m e m m m o m m e o m o m m m o o 61

4.1 MAXIMUM LIKELIHOOD CONVERGENCE PROBLEMS ................................................................... 61

4.2 PROPORTIONS REJECXED FROM GOODNESS OF FIT STATISTICS ................................................ 62

4.4 SPECLAL CASES .................................................................................................................................... 75

................................................................................................. 4.4.1 C hi-Squared Predictor Variable 75

4.4.2 Predictor Variables with Highu Variances ............................................................................. 76

............................................. 4.5 PERCENT AGREEMENT B ~ ' V GOODNESS OF FIT ST.~STICS 77

4.6 PERCENT AGREE~IENT IN SPECLU CSES ..................................................................................... 81

4.7 PO- ................................................................................................................................................. 81

........................... 5 APPLICATION OF GOODNESS OF FIT TESTS TO REAL DATA 83

6 DISCUSSION AND C O N C L U S I O N S o m o m o m m o m m m o m m m ~ o m m o o m m m m e o m e o e o o o o o o o o o e o e o 8 8

APPENDIX A m m m ~ ~ o m ~ m m m m ~ m m m o o o ~ o m m o m o m o m ~ m m o o m m m ~ m o m m m m m e o m m ~ ~ m o m e ~ ~ ~ ~ ~ m ~ e o m m m ~ ~ ~ ~ ~ m ~momeomooommmoomoemmmm.~mmmmmmommmmomomoooommmmeo93

ADDITIONAL TABLES .............................................................................................................................. 93

ANALYSIS OF VARIANCE TABLES FOR SUMMARIZED DATA .............................. .... ............. 93

PERCENT REJECTED IN SPECML CASES ........................................................................................ 94

.......................................................... ANALYSIS OF VARIANCE FOR PERCENT AGREEMENT 96

APPENDIX B .......e..............m.....e.m....mem..m.m.....e..mem...m. mo*emomommommemommmeommmooeeomommomoomomemomemoommmoomooomoomm99

SAS PROGRAMS .......................... ,,., ...............................................................................~........................... 9 9

SAS MACRO TO GENERUE DATA SETS .................................................................................................. 99

........................ SAS h i . 4 ~ ~ 0 TO GENERUE DATA SETS FOR POWER CALCUL,\ TIONS ................... , 101

C) ............................................................. SAS M X R O FOR LE CESSIE .W D VAAN HOUWELWGEN'S Tc 103

List of Tables

.................................................... 2.1 Example data set to calculate D and X' using b i l i q notation 14

.................... 2.2 Grouping of example data set to fonn binomial notaaon required for D and X' 14

............................. 2.3 Obsenred and expected ftequencies in each group of risk for calculaling 18

............................................ 2.4 Results h m calcuiation of k: using 6 different softwve packages 19

............................. 2.5 Observed and expected frequencies in each gmup of risk for calcdarlig H; 20

2.6 Data set of size 50 to calcuiate and fi; ....................................................................................... 22

2.7 Observed and expectcd values by group to calculate ................................................................. 23

2.8 Observed and expected values by group to caiculate fi: ................................................................ 24

F d y of models as proposed by PrenBce ......................................................................................... 26

F a c t o d arrangement for genented data ........................................................................................... -40

.......................................................................................................... S p e d cases for generated data 40

........................................................................................................... Entire simulation design b y case 41

Size and elements of the genented bbariate obsemaaons ............................................................ +..44

....................................................................................... Percentages of rejecting the null hypothesis 66

............................................. Likelihood ratio chi-squared d u e s for factors predicting rejections -68

............................................................ Percentages of rejecting the null hypothesis in special cases 76

Overali obsemed md expected percent agreement in d observations .......................................... 79

4.51 Obsenred and expected percent agreement data sets of site 20 .................................................... 79

4.5b Obsemed and expected percent agreement data sets of sue 50 ................................................ 79

.................................................. 4 . 5 ~ Observed and expected pescent agreement data sets of size 100 80

4.5d Observed and expected percent agreement data sets of sue 200 ............................. .. ................ 80

.................................................. Obsenred and expected percent agreement data sets of size 500 80

........................................................................................... Power to detect alternative iink functions 82

.............. P-Values for goodness of 6t tests applied to the SLE smdy with 40 covariate patterns 85

.............. P-Values for goodness of fit tests applied to the SLE study with 12 covviate patterns 86

................ P-Vaiues for goodness of fit tests applied to the SLE study with 6 covariate patterns 87

......................... Analysis of variance Mt h continuous outcorne variable summuulig replicates 93

.................................................................... Percentages of rejecthg the n d hypothesis in case 28 94

.................................................................... Percentages of rejecting the n d hypothesis in case 29 94

...................................................................... Percentages of r e j e c ~ g the null hypothesis ui case 5 94

Percentages of reiecthg the null hypothesis in case 6 ...................................................................... 95

...................................................................... Percentages of rejecting the a d hypothesis in case 7 95

...................................................................... Percentages of rejecting the a d hypothesis in case 8 95

F-values for factors p r e d i c ~ g the percent agreement of pairs of GOF tests .............................. 96

Observed and expected pexent agreement beoveea ail 6 goodness of fit tests ........................... 97

................................... A.10 F-values for factors predic~g the percent agreement of a l l G GOF tests 97

............................................. A.11 Overail observed and expected percent agreement for cases 25-27 98

.................................................. A.12 O v e d obsenred and expected percent agreement for case 28 98

A13 O v e d obserired and expected pexcent agreement for case 29 .................................................... 98

List of Figures

3.1 Cornparison of the nomial response mode1 versus the logistic response mode1 .......................... 57

3.2 Cornparison of the Cauchy response mode1 venus the logistic response mode1 .......................... 58

3.3 Cornparison of the n o d response mode1 venus the logistic response mode1 .......................... 59

a

4.1 Percent & rejects Ho auoss d factors .............................................................................................. 69

A ............................................................................................... 4.2 Percent B rejects Ho across 4 factors 70

1.3 Percent D rejects Ho across all factors ............................................................................................... 71

4.4 Percent rejects Ho acmss ail factors ............................................................................................. 72

............................................................................................ 4.5 Percent H; rejecrs Ho auoss d factors 73

4.6 Percent X' rejects Ho across all factors ....................................................................................... 7 4

5.1 H i s t o p of age of onset of SLE ........................................................................................................ 84

Chapter 1

Introduction

1.1 Rationale

The statisticai analysis of dichotomous outcome v h b l e s is fiequently interpreted with the

use of logistic regression methods. The multiple logisac regression model is a commonly applied

procedure for describing the reiaaonship between a dichotomous outcome variable such as presence

or absence of disease, and a number of independent variables known as potential risk factors. It k

an effective method to estimate the odds ratio, a cornparison of the presence of a po t end risk

factor for disease in a sample of diseased subjects and non diseased subjects, while accountlig for

con founding and interacting variables.

Afier f i h g a model to the obsemed data, one of the next essential steps is to investigate

how well the proposed model fits the obsemed data. A model is said to fit poorly if either the

model's residd vaziation is iarge, systematic, or does not follow the variability postulated by the

model (Hosmer et al., 1997). Thue are numesous ways in whidi the logistic modd fit can be

hadequate, the most imponant of which involves a problem with the h e a r component (Coliett,

1991). In the case where important predictor variables or interaction terms are omitted fiom the

postuiated model, the resultuig logistic model fit would be poor due to an incorrect linear

component. The same situation arises when aasfomÿitions of predictor vatiables axe requùed for

a better fit, yet they are not carried out. Influentid observations and outliers cm also play a part in

achieving a poor fit.

A function of the response probability known as the 1Lik funclion is rehted to a h e u

combination of the predictor variables (Collett, 1991). Another po t end error in model fitting

could arise if the logistic transformation is chosen as the luik function when in fact an aitemative

luik fuaction would be more appropriate, such as the probit or complementa.ry log-log link

hctions.

the predictor vviable cded covariate pattuns j of size m, (see discussion of covariate patterns Li

section 2.1. l), to f o m binomiai proportions y, lm, . One of the underlying assumptions in fitti~g a

logistic model is that the y, have a binomd dismbuaon Mth mean m,x, and variance

( ) If the b h q obsen~tions that make up the obsexved proportions u e not

independent, then the vuiuice of y, will not be m,n, (1 - n, ), and the binomial assumption wodd

be invalid, resdting in a poor fit.

One of the most common situations involving the viohtion of the binomial assurnption

occun when the binary observations wi& a covariate patteni are positively correlated, resulting in

ovadispersion, or extra binomial vatiation (Colle% 1991). In rhis situation, the variance MU be

greater than m, Z, (1 - X, ), and the residuai mean d e k c e (deviuice is desaibed in section 2.1 -1) is

lvget than it should be. A much less common situation occun when the b h a q observations wîrhin

3

a covaziate p a n a are negatively correlated, t e s u h g in undedispersion. Since an inconectly

specified linear component, an incorrecdy specified Iuik funceion, or outliers may also cause the

residual mean deviance to rise, it is important to d e out these three problems before inferring that

overdispersion exis ts.

If the predicted values produced by the logistic model accucately reflect the observed values,

then the logistic model may be a good fit for the given data. One method which is used to

drterrnine the suitabilicy of the fïtted logistic model in describing the probabiiity of an outcome is to

use an overd summary measurc which results in a single value that suggests if the model is

appropriate or not. A ciass of such measws comprises the goodness of fit or lack of fit test

statis tics.

In the context of logstic regression, goodness of fit tests are designed to detennine the

adequacy or inadequacy of the fitted logistic model in describing the rehtionship beoveen the

outcome variable and the po tend risk factors. The purpose of the goodness of fit test is to

deterrnine whethu the model fits the data, otherwise conclusions may be incorrect or rnirleahg. If

the model is adequate, then we may proceed. Otherwise, we need to search for a more suitable

model, one that will be more usehil in explaining the outcome variable.

In spite of the fact that seved goodness of fit statistics have been proposed over the past

tarenty yeus, none of them axe considered CO be the unifonnly best statistic in assessing the

suitabiiity of the logistic regression model. Each proposed test statistic has its advantages and

dnwbacks. It is the intent of this thesis to detexmine the condiaons that determine the saengths,

weaknesses, and factors that influence sekcted goodness of fit tests. Thaefore, this MU involve a

aitical assessrnent of the various methods for assessing goodness of fit in logistic regression for the

case of a single predictor variable. Selected test statistics d l be examined M e r to determine how

4

they pedonn under various conditions. Although logistic regression diagnostics are an integrai part

of model adequacy, they will not be assessed in this thesis.

The logistic regression goodness of fit tests wdl be exvnined by petformbg logistic

regression on srveral randody generated data sets. The data wili be randomiy generated ushg the

following plan:

Randomly geaerate predictor variable X with different mems, viieances, and sarnple sizes

The probability ~ ( x , ) of X leaduig to a positive biaary outcome will be

calculated Lom the logistic rnodel, such that the parameters Bo and P, vaty to

produce v q i n g levels of the prevalence of the outcome variable Y

An independent uniform random variable udl be generated and compved to n(x,) to detexmine the outcorne variable Y (O or 1)

After the data have been generated, logistic regression models MU be fined, and the

fiequenues of rejecting the n d hyporhesis of an adequate logistic model for the seiected goodness

of fit tests d be reponed and compared amongst tests. It will be also determined when the various

statistics agree in ternis of r e j e c ~ g or f&g to reject a good fit of the logistic model. The power of

each of the selected goodness of fit tests d also be examined.

1.2 Organization

This thesis is organized as follows. The temainder of chapter 1 describes the simple hea r

logistic regression model. Chapter 2 reviews the theoretical background, advantages and

disadvantages of the goodness of fit statistics, which d be exaniined in detail in this thesis.

Chapter 2 ais0 describes other statistics that are not goodness of fit tests as well as goodness of P

tests that will not be examined thoroughip. Chapter 3 cupiains the rationale and design of the

simulation of data, as well as the methods of caicuiation of the test statistics to be compared.

5

Chapter 4 displays and interprets the results of the goodness of fit tests. Chapter 5 applies the

goodness of fit tests to a real data set, and illustrates their performance under a "real world"

situation. Concluding remarks ate made in Chapter 6.

1.3 The Simple Linear Logistic Regression Model

Regression analysis is one of the most commonly used methods to determine the

relationship between an outcome variable and a predictor vaxhble. The most typical example of

regression analysis is linear regression modeling, where the outcome variable is continuous.

Alternatively, when the outcome variable is binuy, logisac regression analysis can be applied. The

same general principles that are used in linear regression ate applied to logistic regression, although

there are some differences. In logisac regression, the conditional mean is bounded bemeen 0 and 1,

rather than -a and a, as in linear regression. The other prominent diffuence is that the

conditional distribution of the outcome vuiable has a binomial distribution rather than a normal

distribution. A description of the logistic regression model follows.

Suppose that we have n binary observaaons of the form y, , i = 1,2, ..., n . Let Y denote a

dichotomous outcome variable, which may assume values "1" if the event occurs, and "0"

otherwise. Let the vector r' = (x, , xz ,..., x, ) denote a set of p predictor variables. The logistic

model which relates the probability of the event occuning to the predictor variables x is given by:

and thus

6

The vviables in x' = (x, ,x, , ..., x, ) c m be discrete, binaty, or continuous. In this thesis, the

situation to be considered is the univaaate situation where p = 1 predictor variable, and thus

x = ( x ) . The values of x, d be generated using the n o d , uniform. and a s p e d case of the

chi-squared distribution. In the strictly univariate situacion involving only one predictor vatiable,

equation [l . l] becomes:

After perfonning the logit transformation on n(xl ) in equation (1.21, we obtain the folloaring

simple linev logistic model:

ml = logit(ntxl )) = l o g ~ m , )/(l -ml NI = B, + B X , il-31

whue log denotes the n a d logadun. The linear logistic model is a member of the dass of

genenlized hear models (Neider and Wedderbu.cn, 1972). This dass of generaiized h e u models

allows Z(X, ) to be related to the h e u component (Bo + 4 x1 ) by the use of a logistic Iuik function.

1.4 Finhg the Linear Logistic Mode1 to Binary Data

In fittîng a univathte h e u logistic model to a aven set of data, the two unknown

parameters Bo and fl, from (1.31 axe estirated using the method of maximum likelihood. Since the

observations u e assumed to be independent, the likelihood function is given by

The estimation of B, anda, require the mwllnLation of the likelihood fiinction or

equmalentiy the maximization of the muai logarithm of the Urebood fiinction denoted by:

One approach to the maximization of 11.51 involves the differentiation of ln[L(po, Bi)] Mt.

respect to Bo and p, , and setting the two resultuig equations to zero:

and

One of the methods that can bc used to solve this system of equations involves iterative cornpuang

methods. The solutions to equations i1.61 and [1.7] d result in solutions to Bo and pl, which are

A a

cded the maximum likelihood estimates of B, and 4 , denoted by Bo and 4 . The

likeiihood esiirmte of x(x , ) esthmates the conditional probability in [1.2] that an event oc-, and

is denoced by

&xl ) is also caiied the fitted or

11 -4

predicred value for the logistic regession model. One of the

properties resulting h m is that the sum of the fitted values is e q d to the surn of die obsuved

values:

The estimated logit h c t i o n is consequentiy denoted by:

Chapter 2

Description of Goodness of Fit

Methods

Afier fitting the logistic model to a set of data, it is teasonable to detumine how weil the

fitced values under the model compare with the observed vaiues. The sections in this chapter that

follow MU describe numuous goodness of fit tests, of which 6 M11 be examined thoroughly in the

rernaining chapters. These 6 goodness of fit tests axe desuibed in sections 2.1.1, 2.2.1, 22.2,2.3.1,

and 2.4.1.

2.1 Tests based on Covariate Patterns

2.1.1 Pearson's Chi-Squued X2 and Deviance D

In logistic tegression, there u e various possible masures to compare the o v d diffaence

berneen the observed and fitted values. Two of the most commonly used goodness of fit meames,

9

which are readily available in commerd s o h e , are the devkmce D and Peuson's chi-squued

x2 goodness of fit test statistics. Before desaibing these two test staastics, a bief desctiption of

covariate patterns d be given.

The temi used to represent a set of values for the explanato ry variables for each subject is

cded a covariate pattem, which will be denoted by j (j = 1,2, ... , J). If each subject in a sample

of observations has a unique set of vaiues, then the aumber of covariate patterns is said to be equai

to the number of observations in the sample and J = n. Such a pattem is common when the

explanatory variables from a smple are continuous and very precise, r e s u l ~ g in b i n q logistic

regression analpis. If each subject in a sample does not have a unique set of values, then the

number of covviate patterns wiii be less than the number of observations in the sample and J < n.

In the latter situation, the number of subjects with X = x, , that is, the number of observations

J

~ 4 t h covariate pattern j is denoted by m, ( j = 1,2 ,..., J), such that C m , = n. The number of 1"1

J

events for each m, is denated by y ] , and it loilows chat y, is equd to the totai number of 1'1

events. In the situation where J < n and each m, > 1, the b i n q outcornes can be grouped or

sumrned to obtlin a binomial random vlalble. In this Cucumstance, the anaiysis is called binomial

logistic regression. When the binaq responses are grouped, the fitted d u e s are calcukted for each

covariate pattern, and depend on the esrLnated probability for that covariate pattem. The fitted

d u e for covariate pattern j, y, is expressed bp

where &, ) is the estimate of g(x,) for covariate pattern j.

The ensuing discussion leading to the deviance D and Pearson's chi-squared X' goodness

of fit tests involve the binomial logistic regession setting. The likelihood functioa for the strictly

binaq case was defbed in [1.4] and [l .SI. The likelihood function in the binomial setting is defined

as (CoUett, 1991):

a

When the unknown parameters 1, and p, are set to their maximum likelihood estimates Bo and

Bi, the d u e of the likelihood funcrion can s u m m d e the extent ro which the sample data are

fitted by the cumnt or 6ned model, denoied by i,. Since i, is not indepuident of the number of

observiitions in the sample. it is not usehi on its own in deteminhg mode1 adequacy (Coilen,

1991). Under the fitted modei, the iikelihood can be e n e n as:

where i ( x , ) = j, lm, are the fitted proponions within the j dl covariate pattern.

The Ml or sairated model is one that has the same number of unknown parameten as

observations. This is the model for which the fitted values coinade with the observed values,

resulting in a model that fits the data pdectly. The maximum likelibood for the full model is

a

denoted by L,. Under the hli modeî, the Likelihood c m be written as:

whue n(x, ) = y, lm, are the observed proportions withia the j th covariate pattem.

The compagson of the kelihoods of the fitted mode1 and fidi model c m coasequently

indicate how accuntely the fitted model represents the observed data (Collett, 1991). It is more

practical to compare the likelihoods by multiplymg the ratio by -2, which results in the deviance:

D = -2 log(Lc /if ) = -2(1Og i, - log L, ) [2.51

Therefore, [2.5] becomes

which compares the observed vaiues y, Mt41 the fitted values j , for covariate pattern j. More

expliatly, the deviance can be expressed in t m s of deviance residuals, where the deviuice residual

is dehed as (Hosmer and Lemeshow, 1989):

The deviance dii-squated goodness of fit statistic is the s w n of squares of the deviance tesiduais

denoted by:

From [1.5], in the binary logisac regression setting, the likelihood under the fitted model is

Under the fidl model, since i ( x , ) = y, , and the only taro values for y, u e O and 1, this chat

Y, logy, and (1 - yl ) fog(l- yJ wiil borh be O. The result is that the deviance D is reduced to

which does aot compare the observed values 4th the fitted vdues, and thus it is invaîid as a

goodness of fit test in the case where there are J = n covaeiite pattems.

For a pvOculv covariate pattern, the Pevson residual (Hosmer and Lemeshow, 1989) is dehed as:

and therefose the Pevson chi-squared goodness of fit statistic is defbed as:

Once again, in the suictly b i n q case, Pearson's X' becomes (McCullagh and Nelder, 1989):

Since the sample size is not a helpM goodness of fit test, X' is dso invalid as a goodness of fit test

in the case where there are J = n covariîte panems.

Under the assurnption that the 6tted model is the conect one, both the Pevson X' and the

deviuice D goodness of 6t statistic have aspptotic chi-squued dismbutions with J - p - l

(number of covleiate paenis - number of unknown parneters in the fined model excluding the

intercept - 1) degrees of fkeedom. For the deviuice, the aspptotic distribuaon mkes intuitive

sense since the deviance 1s the likelihood ratio statistic for the Mi or saturated mode1 with J

paxameters rektive to the fitted model with p + 1 puameten.

An elementary hypothetical numerical example of smple size 9 is presented to demonstrate

the ability to caicuiate the Pearson and deviance staastic by hand. Table 2.1 illusates the binary

notation of the data where k t column represents the predictor variable X, and the second column

represents the outcome variable y t . Table 2.2 displays the binomial notation, aftm the b 1 . q data

has been grouped.

Table 2.1 - Example data aet of ample size 9 using binary notation

to illustrate denance D and Pearson's X* goodacss of fit tests.

The data set has the following properties:

J = 4 covariate patterns

df =4-1-1= 2

Table 2.2 - Gmuping of exampk data to form binomial notation requimd to calculate deviance D and Pearson% X' goodners of fit terts.

a a

The maximum likclihood estirmtes are Po = -1.34, = 0.15

Using equations (2.71 and [2.8], we obtvn the followiug vaiues for the deviance chi-squued statisac:

15

Ushg equations [2.1 O] and (2.1 11, we obt?in the followhg d u e s for Peuson's chi-squared statistic:

A disadvmtage of D md X' is in the case where there are n covariate patterns, J = n ,

the deviance and Pearson suastics u e uninfomiative about the goodness of fit of a mode1 because

the? ody depend on the fitted probabilities and sample size respectively, as indicated bv [2.10] and

[2.13]. Since the asyrnptotic distributions of these test statistics are based on the assumpaon that the

numbu of obsenrations in each ceil tends to i nh ty , p-values c a l d t e d for these NO statistics

when J s n , using the Xh-,-,, dismbution are erroneous (Hosmer and Lemeshow, 1989).

One of the advanmges of these car0 statiscics is that they are usudy part of the standard

output of most statistical software packages. Another usehi chancteristic of D and X' is the

elementaq caldation of the statistia and the associated p-value. Although D and X' will usuaily

have difkent values, caution should be taken when the ciifference between the cwo statistks is

large. Such situations may indicate that the chi-squued approximations to the disaibution of D

and X2 are not satisfactory (Coilett, 1991).

The deviance statistic is generally ptefened to the Pevson statistic, especially when logistic

models are fitted by the method of maximum Wtelhwd because the maximum likelihood estimates

of the success ptobabilities maxîmize the îikelihood huiction for the fitted mode4 and the devhce

is minimized by these estimates (Collett, 1991). Anothu advantage of the deviance statistic over the

16

Pearson statistic is that the devknce cm be used to compve a sequence of hiuvchical models

whereas the Pearson statistic cannot be used in such a way (Couen, 1991).

Amongst s e d c o m p e ~ g goodness of fit tests in a stricdy binary setting whidi did not

indude the deviance, Hosrnet et ai. (1997) concluded that one of the primaq goodness of fit tests

should be the Pearson chi-squued test. They used a scaied chi-squued distribution, where the p-

vdue was based on estimated mean and variance of x',

2.2 Tests with Grouping based on Estimated Probabilities

Aside fiom the use of the Pearson and deviance statistics, few goodness-of-fit tests have

been developed and adopted as part of the standard output in statistical sofwue. Hosmu and

Lemeshow developed seven test statistics involving grouping based on estimated probabilities

obtained kom the fitted logistic model, and grouping with respect to f ed pre-detennined cutoff

points (Hosrner a d Lemeshow, 1980). Ody two of the s e v e n fomiulated staristics, Ci, which is

based on grouping the data according to estimmted probabilities, and fi:, which is based on

grouping the data according to 6xed cutoff points, are ever used by the authors; even though they

originaily conduded that two of the other seven statistics were the prefened goodness of fit tests.

Moreover, in Hosmu and Lemeshow's text describing logistic regression, the only statisuc defined

and adopted as part of output in staasticd s o h u e h m the seven is Ci. The remaining five

statistics that are not as feasible as (?: and < requLe the assumptions undedying the application

of discemiaant hction d y s i s , dis-mt funaion eshates, or numaicd intepaon.

The tests proposed by Hosrner and Lemeshow (1980; 1989; 1982; Hosmu et al., 1997) were

based on binary logistîc regression n t h u <han b i n o d logistic regression, and thaefore did not

tequire fmer covatiate pattems thnn obsemations. However covariate patterns do ultimateiy assist

17

in deciding whïch vaiues are grouped, since the grouping process does not sepvate observations

within covariate patterns. One disadvantage of grouping the data into tables is that important

deviations fiom fit due to a srnall nurnber of individual data points niav be unaoticed.

2.2.1 Hosrner and Lemeshow's e; The caiculation of the Hosrner and Lemeshow goodness-of-fit statisüc 6;. often denoted by

6 , is based on the grouping of estirmted pmbabilities [*(x, ), ?(x, ), ..., ?(x, )] o b h e d the

fitted logistic model. The 6rst group conMs apptod te ly the smallest n/G values of * ( x , ) and

the second group contains approximately the second s d e s t n/G values of &, ) , etc.. . w h ~ e n

represents the size of the sarnple and G represents the total number of groups. The expectltion

would be that more subjects with the ment would fd into the upper groups of cisk while those

subjem w i t h t the eveot would fd into the lower groups of Bsk. The foilowing nouaon based on

table 2.3 denotes the observed number of subjecu who have had die event occur and not ocau

respectively in each group g (g = 1,2, ..., G ):

whue n, is the number of obsemaaons in group g.

The expected number of subjects who have the disease present and absent respectively is denoted b v

ci k simply obtained by calculaMg the Pearson chi-squue statistic korn the 2x G table (table 2.3) of

obserp-ed and expected fiequenaes, and is denoted by:

In calculating the expected vahes e,g, if the predicted probabilities are less than 0.0001 or greate-r than

0.9999, one option is to change e , to 0.0001 or 0.9999 respectiveiy. ci uses ail the available subjects,

those with and without the "disease" present. Simuiations have verifed that when J = n , if the

number of covariates plus one is less than the nurnber of groups (p + 1 < G), and the fitted logisac

cegression mode1 is the conect one, then the distribution of ei is appioxirmted by a chi-squue

distribution with (G - 2) degrees of fieedom (Lemeshow and Hosmer, 1982). If J * n , the authors

ùaim that Ci wouId k e l y have the s v n e distribution, although had never been investigated (Hosrna

and Lemeshow, 1989). There was no mention of the case where J < n , which is the case king

examined in this thesis.

The as@g of esrimated probabilities into groups of risk ensures that there are a faU number

of subjects in each group. However, the acnial values of the estimated probabilices of developing the

19

disease are discarded. Due to the fict that the caiculation of Ci is dependent upon how the cut-points

are speded, it can be unstable. To h t r a t e the instability of Ci, in their most recent work on

goodness of fit tests, Hosmer et ai. (1337) p d o m e d Iogktic regression on a data set hvo* low

birth weight (Hosmer and Lemeshow, 1989) using 6 different s o h e packages. AU six packages

produced the same fitted modd yet the c a i d t e d Ci values wue different, resuiting ia six diffuent

p-values as seen in table 2.4.

Tabk 2.4 - Resultii h m Hosrnet et al., (1997) calculntioa of using 6 di&rent softwate packages

Since diffuent sofwve packages have th& own algonrhms to detemiine whkh cut-point ML1 d e h e

Statis tical Software

BMDPLR L0G)iACT SAS STATA STATISTDi SY STAT

the groups of hk, diffuent conclusions, either rejecting or fahg to reject the nuii hypothesis of a

good fit could be made dependlig on the magnitude of the ciifferences ncross soha re packages.

2.2.2 Hosmer and Lemeshow's &;

P-vdue

0.020 0.109 O. 159 0.127 0.147 0.065

e; 18.1 1 13.08 1 1.83 12.59 13-11 14.70

The second goodness-of-fit test to be desaibed fiom Hosmer and Lemeshow (1980) is H ; ,

df

8 8 8 8 8 8

o h denoted by H . It incorporates the formation of groups based on 6xed cutoff points, which are

pre-speded values of the estimated probability computed fiom the fitted modeL Table 2.5 illusmates

the groups of Bsk whue the obsenred ob and expected eg fiequenaes are the same as in equations

[2.6] and [2.7j, with the exception of the grouping method

Tabk 2.5 - Obsemd and expected fkqucncics in cach group of iisk foi calculatiag fi;

Although the cut-points are ditcary, o k n times the groups of 9sk are dassi6ed into ten groups

similv to the grouping s h o m in table 2.5. fil is calculated in eyactly the same fashion as ei :

The distribution of fi; is approximared by a chi-square disaibution Mth (G - 2) degrees of fieedom*

Cornparisons between and fi;

The ody diffaence betweea the derivation of and fi: is the manner in which

the table is constructed. In the original paper, Hosmer and Lemeshow (1980) did s e v d

sirndations undu 21 different situations with data sets of sîze 150, 175, and 200 to detennine the

behavior of d of th& test statistics. They declved that the most feasible statistics for tesMg

goociness of fit in logistic regession were neither H: nor ci. Hoarever, in msuing research,

(Hosrnu and Luneshow, 1982; Hosrner et ai., 1997), they oniy used fi; and ei, not theit othu

2 1

statistics when compazing their goodness of fit test statistics to others. Initidy, cornputer

simuiations showed that H; was more powerhil thm 6; and was deemed the prefened statistic

(Lemeshow and Hosmer, 1982). However, it was later remarked that the grouping method based on

percedes of the estimated probabilities, el aras preferable to the oae based on fixed cut-points,

fi:, because the former seemed to comply better with the hypochesized chi-squued distribution

(Hosmer and Lemeshow, 1388; 1989). In calculating fi: , in situations where estimated probabilities

do not fd h to each of the pre-specified groups in, fewer groups cm be used. Simikriy, in

calculaMg Ci, if there are fewer cov-te patterns with an abundmce of replication, there M11 be

fewer groups. For example, if ten groups were initiaily speufied, and rwo of the groups had no

enmes in hem, eight or nine groups could then be used to ensure enmes in each group. (?: wodd

u s d y have more eveniy disuibured frequencies in each celi as compiued to H ; . III th& htest

pape*, Hosmer et al. (1997) recommended that 6; and be used to conkm iack of fit afier

using other goodness of fit tests.

To illustrate the ease of caiculaing and fi;, a simple numerical example is presented.

Table 2.6 @lap the predictor variable x, , the outcome variable y,, the fitted probabilities k,, and

the groupiags bnsed on and H; h m a data set of sample size 50.

Tabk 2.6 - Dota set ofaize 50 uaed to caicuiate t; and k;

Table 2.7 displays the obsemed and expected values produced by SAS software, which enable 6; to

From [2.15] and table 2.6, we obtain

Table 2.7 - Obsemed and expectcd values by gioup <O caiculnte 6;

From [2.16], we obtain

df = 7, p-value = 0.255

24

The groups of Bsk used to calcuiate H; were ueated by using the cut-points in table 2.5. Table 2.8

summarizes the data requked to calculate <. From [2.15] and table 2.6, we obtWi

Table 2.8 - Obrerved and expected values by gmup to calculate H;

From [?.I l , we obtain

df = 8, pvalue = 0.402

If it was decided to have 9 p u p s rathet than 10 groups in calculating < , the cut-points would be

in incremena of 1 .O19 = O. 11, resultlig in H; = 9.37, p-due = 0.312, which is doser to the values

obtaiaed in calculating Ci. This indicates that when the groupings are vasdy diffaen& the values for

6; and Hf wdi be hinher spart, accounting for conflicts in rcsults between the m o test statistics.

Score Test Statis tics

Brown% Score Statistic h Brown developed a score test statktic which essentiaily compves ouo fitted modeis. The

approach embeds the logistic model into a more g e n d paramemc f d y of models (Prentice, 1976)

in which the logistic mode1 results kom certain panmeters taking on parti& values (Brown, 1982).

The generai f d y of modek is dehed by:

whue m,, m, > O , B(m,, m, ) is the Beta function defhed as:

and R(x,) is as dehed in [LI]. When m, = rn, = 1 , we obtain the foliowing:

and thus the logistic model is attaked. In the univariate se* that is with p = 1 predictor vdble ,

the foliowing values of m, and nt, resuit in the msuing f d y of models:

ured in calcuhting Bmwn's score

N o r d Extreme Minimum Vaiue 1 +a0 1

1 Extteme Maximum Value 1 1 I + 4

The assumption of the g e n d l d y of models in [2.18] p d t s a smtistid test of the goodness of fit

of the speafiç logistic model (the null hypothesis bang Ho : m, = m2 = 1 ) to the generai paramemc

f d y (Brown, 1982). The asymptotic distribution of the score stadstics for the parameters in the

g e n d mode1 provide the basis for the statistical procedure.

The score statistics of the parameters are the partial deatives of the log likelihood L (L(p)

as detined in [1.5] with PB (x, ) in phce of ~ ( x , ) ) of the observed data with respect to each

parameter. In the univariate logistic model se* the score statistics are dehed by:

dlnL -= 2 (Yi - PB ( X t 1) = tI @, ml

dlnL - = tb, - P B (x, )XI + 4 L P B ( X I )l/(l- PB (x, 1) = SI d m , ml

The score statistics S' = (s, , s, ) form the basis of Brown's goodness of fit test; they are asymptoticaiiy

johdy n o d y duttibuted. A test of Ho : m, = m2 = 1 (acceptability of the logistic model) yieids the

foliowing Brown's Score test statistic:

jj = s t ~ - ' c

where C is the estimated covariance rnatrix of s such that

= z'I, k a ( p + 1) x 2 covlrLncc mat& of t and s'

In the univariate logistic setting with p = 1 predictor vuiabie, the foloMng equations are obtained:

log PB ( 4 f P B ( x l ) ~ E ( x l l ( l + roi ]

28

If the predicted probabiiities are less than 0.0001 or pater than 0.9939, one option is to add or

subtract 0.0001 respectivdy. Brown used simulations of sample site 50,100, and 200 to show that b

h s an approxirnate chi-squued distribution with 2 degrees of fieedom. Alternative iink hc t ions

were used to investigate the power of^ . It aras shown that B had poor power when the alternative

model was the cumulative nomial distribution bct ion, which beloags to the general f d y in [2.18].

It had better power when the aitemative model was the extreme vaiue distribution, which belongs to

the g e n d f d y in [2.18]. The hq$est power was against the Cauchy dismbution aitemative modd,

which is not put of the generai f d y in [2.18]. He also found that the power of B w u dependent on

the sample size of the data.

Brown's Score statistic is part of the standard output in BMDPLR software, and it is relarively

easy to cornpute. Brown (1982) onginaiiy specified i a applicability to continuous covariates, but did

not mention utegoriai covariates or a mixture of categoùcai and contkiuous covariates. Since

Brown's score statistic is a test of the adequacy of fit of the logistic model relative to the specified

generai parameuic family of modeis and not aii other possible models, it has been said that it is not

considered to be an overall goodness of fit test (le Cessie and van Houwelingen, 1991). However

a+st an aitemîtive model not in the g e n d pvlmeaic M y . 11t was also knplied that B would

have low power agamst excessive panas of deviation between the obsemed and fitted data that is

tuidomly dismbuted throughout the range of responses.

2.3.2 Snikel's Scote Test S,

SNkel proposed a goodness of fit score test which bas a similar approach to Brown's score test

It is based on the cornparison of the logistic model to a more generai farnilv of models (Stukel 1088).

A generaiized logioac model was proposed which uses a logit hinction Mth wo addiaonal parameters,

resultlig in the hear logistic model when the two parameters are equal to O. The generai fomi of the

proposed model is

wmi n(x, Hl= ha (Po + Plx, = g(x, )

where the ha (Po + p, X, ) ase strictiy increasing nonlinear hctions of (Bo + p, xi ) indexed by two

shape pararnetexs, a, and a? such that for Po + &ci a O which is equivdenr to x(x, ) S i , since

g(x, ) = log[x(x, )/(l - R(X, )] < O

= [7r(x, )/(l - z(x i )] s 1

=x(x , )s+

we obtain the following:

-1 (ail80 +PtxD-i h, = a, e,

The logistic model is achieved when cr, = a, = O and &us the proposed score test is a test of

Ho : a, = a, = O , and is denoted by

dInL dInL whue ri = (d<r,,da_), 1 and nd' Y c a i d t e d using the Fisher Ylforxnation matrjx. Sr has an

asymptotic &-squued dismbution with 2 degrees of Geedom.

h o n g s t several competing goodness of fit tests, Hosmer et al. (1997) conduded bat S,

dispiayed high powa under chree types of departutes, and should be used as a primary goodness of

fit test,

2.3.3 Tsiatis

Tsiatis (1980) proposed a goodness of fit test statistic, which has an aqmptotic chi-squued

distribution. The statkac is a quadraac f o m of the diffaence becareen obswed and expected counts.

The test saristic is based on the partitionhg of the covaiate space into sepante regions prior to

calculaàng the statistic. The covariate space 2, , 2, ,..., 2, is phtioned into k regions denoted by

R, , R2 ,..., R, . h categoricai variable with k levels is introduced into the model conespondiag to the

k groups. Using the heo~~arkig as the logistic model:

iogt(x(x, )) = log [z(x, )/(l - *(x, ))] = p' z + y' I

where

The a d hypochesis for the goodness of fit test is Ho : y, = y? = ... = y, = O and the test statistic is

dehned as:

where W' = (dln Lldy, , d ln L /dy , ,..., a ln Lldy, ) , L is the keiihood hc t ion as debec

and V,,, = A - BC" B' such that

A,. = - d 2 ln L/dy,dyl. (j, j '= 1,2, ..., k) B,. = -d2 1nL/ây,dp,. ( j = l ,2 ,..., k, j' =O, ) ,..., m )

CU, = -d2 ln L / d f l , ~ , . ( j , j' =OJ, ..., m)

2.4 Smoothed Residual Tests

2.4.1 le Cessic and van Houwelingeds

Another method used to assess the fit of logistic regression models is the technique based on

nonpammetric kernel smoolhlig. Initially, Copas (1983) used nonparamemc kemel methch to plot

the observed and smoothed outcome vunis the covariate. Similuly, Landwehr et ai. (1984) designed

gnphical methods to assess the adequacy of model fit by using dusters of neighbouring points.

Az~llini et ai. (1989) used smoothing techniques to dwelop a pseudo hkelihood ratio test to provide an

o v d goodness of fit test, as well as confidence bands to detemine types of model deparnues in

various regression settings. Fowlkes (1987) inttoduced loptic regression diagnostics based on

smoothing, whkh determined the adequacy of the fitted iogistic rnodel and the types of hadequacies if

they ucisted.

All of the smoothing techniques referenced above provided the hpenis for le Cessie uid vui

Houwelkigen's (1991) goodness of fit test, which was iarended for continuous covaxiates. The test

statimc is based on nonparamemc kemel esrimates of the standudized residulls. Unlike the deviance

and Pearson goodness of fit tests, le Cessie and van Houwelingen's goodness of fit test is based on

32

each individuai obsemtion, ratha than the covaglte patterns. In the univariate logistic s c e ~ o , the

standardized residuais are dehed as

Under the n d hypothesis of a logistic model, the sruidudled residulls have variance of 1. The

smoothlig function of the standardized residuals is a weighted average of the residuals in the

neighborhood of x, and it is defined as

where h, is the bandwiddi conwüing the amount of srnmthmg and the sUe of the region that the

residuais are avenged. K is a nonnegative qmmeaic bounded kernel hction which d e r d e s the

weighting, and it is nonnaiized accord@ to IK(z)& = 1 and IK(z )? (LT = 1 . In the onguial papa, it

was shown that

.rile proposed logistic regession goodness of fit test stacistic is a weighted sum of the smwthed

standardized residuals, and is denoted by

where

Y - t i (X) = (1- H)(Y -n(X))+o ( d l ' ) , o,(n-"'1 = ( p -p)

V is the diagoriai ma& with weights v, = ~ ( x , )(1- ~ ( x , ))

Wq Y, the column vector with the j element being W, /[? w 2 q r ) @

such that the kemel smoothing w*ht is the distance ktween subject q and subject r is denoted by

the weights in the x space, dehed as

and

where ail siunmations are over ail obsemauons and

The advmtage of Tc is b t it does m t ~ q u k e the covatiates to have repliution, since die test

statistic is based on each individual observation, and the covariate panems are not needed. Another

34

benefit is that it detects deviations of the mode1 in aU dùections, as opposed to statistics that group

observations according to $(x,). The individual elements that form c m be used as a dqnostic

tool for ill fitted observations.

6

Although T, was oegiaaily designed for continuous covhtes, options are discussed foc

categorical covariates. In a more recent publication (le Ces& and van Houwelingen, 1995), tc ans

applied to a mixture of cltegoricd and conturuous variables. Aldiough tc is oot simple to calculate,

its variance is even more laborious to compute; however a simple to use SAS rnaao is avadable fiom

the authoa upon request. hlthough the selection of the kemel hc t ion does not appev to greatly

1

affect T, , the choice of an appropriate bandwidth is cruaai. The authors suggest that an appropriace

bandwidth selection would be appro.vimately & . Based on simuktions that were done using data sets

of size 100 and 500, was one of nine statistics cornpved under 6ve different univariate rnodels, and

thee additional multivaiate models in a binary logistic senhg with no replications, that is Mth J = n

A

c o v h t e patterns. Ir was recommended that T, be used as a secondary test in assessing mode1

adequacy (Hosmer et ai., 1337).

2.4.2 Hosmei and Lemeshow's

FoUowing in the footsteps of le Cessie and van Houwelkigen, anocher test statistic was

desaibed by Hosmer et al. (1997). The srnoohmg based statistic, denoted by T,, uses the cubic

A

weights in the y space radier than the uniforni kernel weights in the x space as carried out in T, (le

Cessie and van Houwelingen, 1991). The distance between çubject q and subject * is denoted by the

weights in the y space, defmed as

where c, is a constan& which depends on ni ; c, is chosen to satisfy wir & # O. The test statistic

is as defiaed in [2.29], with cliffant weights wir nther than w,,, . Similv to the performance of &,

Hosmer et ai. (1997)

nthu than uwig it as

., recommended the use of TE as a secondary test in assessing mode1 adequacy,

a primaq goodness of fit test.

2.5 R* Type Statistics

S e v d R' type statistics have been pmposed to investigate the goodness of fit in logistic

regression &emeshow and Hosmer, 1982). The b t simple statistic to be desuibed is the average

proportion of variation e x p h e d (AVPE). The AVPE calculates the average proportion of the

variance of the probability of an event It is denoted by the foiiowing (Gordon et al., 1979):

unconditional variance of y - average conditional variance of y AVPE =

unconditionai variance of y

The conditional variance of y is

The unconditionai mean of y is

The unconditional variance of y is w , whue = 1 -Z. Thezefore, the avenge proportion of

variation expiained becomes:

AVPE = 1

ir4

Unfortunately, there are situations in which the denominator can be zero, R S U ~ M ~ in an undehed

AVPE. However, the sinutions which cause the undefkied statisuc are &ely to occur in real wodd

applications (Gordon et al., 1979). hothet problem with this statktic is that the upper bound has a

extensive range of vaiues, depending on the distribution of n(x,) (Gordon et al., 1979).

Several authors have generalized R' fian the "suai linex regression mode1 sceoario to apply

to more g e n d models where maximum kelihood is the citerion of fit. Cox and Snell (1989), M w e

(1990), and Maddala (1983) independently (Nageikerke, 1991) proposed the following type of R':

the intercept mode1 only.

N l g c l k e (1991) proposed a rnodiiication of R' , more appropriate for logisac regression, n d y

= R; /rnax(~,L ) [2-351

where rnax(~i) = 1

2.6 Logistic Regression Diagnostics

2.6.1 Pregibon's Diagnostics

Hut Mutrk

Pregibon (1981) extended linear regression dmgnostics to logistic regmsion in a renowned

papa. In order to derive the slgnifïcant diagnostic smtistics in logistic regression, it is necessvg to

37

manipulate the components of the residual sum-of-squares and the "hat" ma& for logistic

regressioa dehed by

H = v "x(x' L/X)-' X' v'f2

where V is a diagonal ma& such that the components of V are P = m j i ( x j ) [1 - I )(x )]

and Xis the "design" ma&; which is the J x ( p + 1) ma& containing the values for d

Jcovariate patterns fonned Gom the obsemed values of the p covaxiates. The j'h diagonal

element of H is denoted by:

hJ = nt ,n(x, ) [ l - I î ( ~ , ) ] ( l , x;)(xvx)-'(1, x;)' = V, x 6,

where b =( l ,x f )(XW)-' (1 ,~: )' and hJ is equd to the number of paramerers. The diagond I J

elements of the hat ma&u are effective at uncovering extreme or leverage points. If the hat ma& is

expressed as an n x n rnatrk for b i n q response data (ungrouped data), chen each diagonal element

h s an upper bound of 1 / m, . If the hat ma& is based on data that is grouped by each covarhte

pattern, then each dugonal elernent has an upper bound of 1 (Hosmer and Lemeshow 1989). If an

observation has an estimated probability that is either less than 0.1 or greater than 0.9, then the d u e

in the hat diagond ma& might not measure ieverage as "hinher from the mean" correspondhg to

a higher vaiue.

Otber Diagnostics

N-fitted obswrtions can be inspected by looking at the diffaence in the Peanon and

d&ce chi-squued statistics when each obsemation is deleted. Another approach is to d e the

standardized Merences in the parameter estimates due to the deletion of each observation.

Chapter 3

Simulation Study

The sections that foilow describe the simulation plan and the methods used in g e n e r a ~ g the

data. The computationai methods for the 6 goodness of fit tests to be compared are illusated, and

the tediniques to compve h e m are described.

3.1 Design of the Expriment

The following goodness of fit test statktics wexe examined under different situations:

1. le Cessie and van Houwelingen's

3. Deviance D

5. Hosrner and LemeshowYs ki

6. Peanon's chi-squued x

39

The initial goal was to design a pian that would allow logistic regression to be perfomed on various

data sets with diffaent charactetistics, then the six goodness of fit statistics would be calculated and

ervniaed under each situation. After the parameter estimates and goodness of fit statistics were

calcuiated, logistic regression would in tum be perfomed on the results of the logistic regression

using the factors beiow as the predictor variables of the probability of a rejection of each test

statistic (further derails can be found in section 3.5). The most effective way of achieving the

aforernentioned goal was to design a layout in a factorial arrangement, so that patterns or differences

might be discemed between variations of the factots.

The factors undu considention that were p r e d e t d e d to vary were the followbg:

the distribution of the predictor viuiable x (normal or uniform)

the vaxiance of the predictor variable x (variance = 1 or 2)

the sarnple size of the generated data sets (n = 20,50, 100,200, 500)

the panuneters used to generate the logistic data

(Bo = 0, - 1.3, - 3, and p, = l , 2 ), whidi produced low, moderace, and high proportion of events ( proportion of y = 1)

The simulations were set up as a fac tod qer iment for the pnmvy analysis, whch involved

the n o d and d o m i dismbutions. To detennine the behavior of the goodness of fit test statistics

on data sets havlig predictor variables with a skewed distribution, three additionai cases of data were

generated whue the predictor vuinble was generated under the chi-squued dismbution. Three

additional cases are used to show the behaviout in data sets where the outcome variable has a bw,

moderate, and high prevalence rate.

As was prevîously mentioned, one of the factors to be considered in the rejection of the test

statktics is the vafi'ance of the generated predictor variable. Since we weze compathg resuits korn the

logistic regession of preedictor variables with v ~ c e s of 1 and 2, it was d e d e d to invest!&te taro

40

more additional cases with the predictor v&ble having a larger variance of 5 and 25. Tables 3.1 and

3.2 and 3.3 best illusuate the o v d layout of the factod amngement as well as the h e additional

special cases.

Tabk 3.1 - Factorid arrangement for gcnerated data

Nonnai Distribution . Variance (x) = 1 I variance (x) = 2

Table 3.2 - Spccial caser for gcncrated dota fkom a rkewcd diauibuaon and nomial distribution with îargcr variance

Normal Dimibution

Variance (x) = 2!i

fl0=1.3

pi=2

n

~SlI(J0.

Nomd Dirtmiiution

Varipincc (x) = 5

PO=-1.3

p1=2

n

20s1100,

mm

Chi-Squmd Distribution

V?riancc (x) = 2

po=-4

pi =l

n

~,50*100,

m. 500

Po=-0.85

p1=1

n

~ ~ . l o o , mm

B0=-2

pi=1

n

~S1100,

200,500

Tabk 3.3 - Entire simulation design by case

Categoy of Prevdence of

Y = l h m

p and f31

High High High High

Medium Medium

Medium Medium

Distribution of X

Low -.

Low

Low

Low

High High High

Wh

Medium Medium Medium

Low

Low

Low

Low

High

Medium

Low

Medium

3.2 Randomly Genetated Data

The data were created using the SAS System for Windows (version 6.10). A single predictor

variable was initially generated using a random huiction in SAS. The generated random predictor

vlriables were controiied by a seed value, which was a r b i d y diosen. The gben seed was used to

obtain the k t obsemation in the stream of the random numbers (SAS Institute hc., 1990)). For each

case iüusated in section 3.1, a data set with 1,020 obsemations was generated to build 6 separate data

sets of sizes 20,50, 100, 150,200, and 500. The fint 20 consecutive observations were used to budd

the data sets of size 20, observations 21 to 70 were used to build the data sets of size 50, and so on. In

the end, the data sets of size 150 w a e not induded since there was enough diversity without the

inclusion of those files. Each set of 5 randomly generated files was repiicated 499 more Ornes, with the

seed vaiue inaeasing by one for each repiicate.

Let x,, ,xC2 ,..., X, denote the elements of a nndomly genulted random variable, which MU

be d e n to be the predictor variable in the fint 24 cases. Let X, - N(10, a:) for c = 1.2, .... 12

denote the cases for the preedictor variable genented from a nomial distribution, such that x, has a

mean of 10, and variance 03 . Let x, - U(a, b ) c = 13,14, ..., 24 denote the cases for the predktor

vaziable generated fiom a uniform distribution on the interval berween a and b , sudi thar x, &O has

a mean of 10, and variance ~ 1 . The sample size of the data set is denoted by

n = 20,50,100,200,500

where there are 500 replicltes for each n

and = V , whcre v = 1,2 conesponds to the variance size.

In the situation of x, - ~ ( l O, 0: ) , x, was generated as foiiows:

p=14 c2 =cf, seed = S

x, = round (~1,~ ) which rounds xl, to one d e d point

where N, (0,l) is a randody generated aumber fiorn the standard nomial distribution and the seed

value is used to obtain the first observation in the saearn of the raadom numbers.

In the situation of x, - U(a, b ) , x, was genented by a uansfomiation of a random variable

generated fiom a uniforni distribution on the intenrd berneen a and b as folows:

p=10, Q' =Q:, seed = S

xl, =a+(b-u)xUs(0,1)

where Us (OJ) is a randornly generated number fiom the uniform distribution on the

intexval (OJ)

X, = round (A,) which rounds xl, to one deamal point

In order to generate x, such that it has a m m of 10 and a variance of C: = v , o and b had to be

solved fiom the following equations

( b - a)' variarice of a Uniform random variable + 1

= a; 12

results in a = 8.27, b = 1 1.73 when O: = 1

a = 7.55, b = 12.45 when O: = 2

In special cases 28 and 29, a: = 5 and 25 respectively.

To ensure replication for each set of predictor variables in each randomly generated data set,

the value of the predictor variable was rounded off to one decimal point. Withm the same step that

the xi's wue gaierated, the probability of an event occurhg as a result of the xi's was calculated

according to the logistic model

proportion of events, P, (X) . P, (X) was then compared to an independently randomly genexated

uniforni nurnber (R - (0,l)) on the intemal between O and 1, such that

It should be noted that in calculating the prevllence of the outcome variable (P, (X)), the predictor

variable was auisformed by subtractïng 10, resulting in a mean of O. This was done to ensure

approp&te levels of the outcome vatiable; othawise it would always be equal to O when the predictor

vaiable has a mean of 1 O.

45

The pmcess outlined above was perfomied 1,020 times r e s d ~ g in a data set with 1,020

obsemations denoted by ((x, ,y, ), (x, , y , ) ,... (x,,, , y,,, )) . The next step partitioned the 1,020

obsemation hto tme individual data sets of sizes 20,50,100,200,500 in the following marner:

As stated above, elements {(x,,, , y,,, ), (xIT2, Y17? ),...(xlZO , )) were omitted. The process

desaibed above was rephted 500 times for each data set in each scenuio, produchg 2,500 data sets

generated for each case. Therefore, then were a t o d of 24 cases x 2,500 data sets = 60,000 data sets

generated, 3 cases x 2,500 data sets = 7,500 for the data sets having chi-squued predictor variables,

and 2 cases x 2,500 data sets = 5,000 additional data sets comblied for cases 28 and 29. 'Zherefore in

total., there were 72,500 data sets created, and d y z e d using logistic regression.

Tdk 3.4 Size and clemcae of the gcnemted bivariate obsewations

3.3 Computational Detds

3.3.1 Logistic Regression on Randomly Generated Data

Univachte logistic regression using the LOGISTIC procedure in SAS (PROC LOGISTIC)

was applied to the generated data to detemine the parameter estimates, and the cdculation of the

six goodness of fit ~utistics. The LOGISTIC procedure uses the itentively reweighted least squares

Size of data set 20

Elements of generated data set

{ ( ~ , r ~ i ) , ( X Z ~ Y ~ ) ¶ * * * ( X ~ ~ > Y ~ P II

46

(IRIS) aigorithm as the defauit method to compute the parameter estimates of Bo and B, (SAS

Institute Inc., 1993; 1997).

The prevdence of rejections fiom each of the test statistics under the different cases presented

above was &O caiculated and stored in a separate füe for each test statistic (six new data sets). For

every goodness of fit test statistic investigated on each data set generated, the following items fiom the

r e s u i ~ g logistic tegression fit were retained and sent to a new data set

the case nurnber of the generated data set

actual value of each test statistic for each data set

the resdting p-value for the test statisac

iadicator vatiable for rejecting or accepthg the n d hypothesis of a good logistic fit

the size of the randornly generated sarnple (20,50,100,200,500)

indicator v ~ b l e for the proportion of the event hom the randomly genented data set (Iow, medium, high) as a r e d t of Bo and p, , which was used in genulting die data fiom the logistic dismbution

inclkator variable for the distribution of the independent variable fiom the randomly generated data set (nonnai, d o m , chi-square)

Bo and p, , which was used in generating the data fiom the logistic dismbution

vafiance of the independent variable which was uxd in generating the data from the logistic dismbuaon

3.3.2 Caiculation of Peatson's A? and Deviance D

The deviance and Pearson chi-squared stltistics are part of the output in the SAS procedure

LOGISTIC (PROC LOGISTIC). The ovo statistics cm be obtained by speafjing the following

options in BOLD CAPS

proc logistic data = intile descending; mode! y = x / SCALE = NONE AGGREGATE; nui;

in the MODEL statement of the LOGISTIC procedure. SCALE = NONE spedes that no

conecuon is needed for the dupersion parameter and AGGREGATE is used to de6ne the sub-

popuiations for cdculating the Pearson and deviance statistics. Observations with cornmon values in

the given list of vviables are regarded as corning h m the same population (SAS Ins timte Inc., 1937).

By defadt, the values are automaticaily sent to the SAS OUTPUT window. In order to exac t the

required information for the Pearson and d e h c e statistics, the values w k h are sent to the OUTPUT

window were re-routed to an exsmai file using the SAS procedure PRINTTO (PROC PRINTTO).

Once the infomiltion w a s in the e n d file, the Pearson and deviuice statistics, dong with th&

conesponding p-values and degrees of i e d o m were obtained and stored in NO newly ueated SAS

data set named deviance.sd2 and Peueon.sd2 respectively.

3.3.3 ~dc~ la t i on of ~ r o d score Statistic Ê

The main component requited fiom the logistic regression output to cakuhte Brown's score

stacistic is the predicted probabiliy of an event response, which cm be obtained by speafying the

foilowing opaons in BOLD CAPS

proc logistic data = infile descending; model y = x ; OUTPUT OUT = OUTFILEl PRED=PREDICTD; =;

whese OUTFILEl is a newly ueated data set containing the predicted probability (variable amed

PREDICTD) of an event response h m the fitted model. Shce univariate logistic regression was

used in modeling the simukted data, the elements of the mamces and vectors were detemiined and

caicukted using equations radiez than using the IML SAS procedure (PROC ML), which c m be used

for ma& manipuiation. To ensue the accuncy of the caîculation of B , test data sets were used to

compare the tesula of calcuiated values <O the resda from BMDPLR, where & is part of the standard

output in logistic regession. The vaiues of B and th& conesponding p-values were stored in a

separate SAS data set named Brown.sd2

3.3.4 Calculation of Hosmer and Lemeshow% 2;

is part of the standard output in SAS, and thus all of the computations of were

executed in SAS, Li the foliowing manner (SAS Institute Inc., 1397):

where

M = target number of subjects for each group

n = total nurnber of subjects

n, = number of subjects in the h t "block"

p nwnber of subjects in the second "block"

Observations were sorted in increasing order by heir estimated probabilities 60m the fined logistic model.

Observations were aiiocated into groups of site = (0.1 x n + 0.5).

In obsuvations with identicai expianatory variables, '%l&" of subjects were fonned.

If the h t "block" of subjects was placed Lito the tkst group, the aiteria for Placement into the second 'Wock" was as follows:

added to the ht goup if n, < M and n, + (0.5 x nd S M, oh&e were placed into the second group.

In Generai, if the (j-1)th '%lock" of subjects was piaced into the k th group, if c is the number of subjects in the k th group, then subjects for the j th block (Mth n, subjects) were also placed into the k th group if c < Ad and c+ [OS x nJ S M, otherwise they were added to the next group.

If the number of subjects in the last group was 5 [0.05 x n], &en the iast trao groups were merged to fomi one group.

Subjects Mth identical explanatory variables were not àivided when being placed k t 0 proups.

50

The obsewations are divided hto about ten groups of approxinutely equal sizes based on the

percentdes of the estimated probabilities. The statistic used to caiculate is a Pearson chi-squued

statistic with the number of groups minus 2 degrees of Emdom. c m be obtaiaed by speafykig the

foiiowing option in BOLD CAPS

proc logisac data = infile descending; model y = x / LACICFIT; m;

The LACKFIT in the MODEL statement of the LOGISTIC procedure is the key statemenc that is

required. The same methods which were used to exact the Pearson and deviance statistics (using

PROC PRINTïO) were used to e x a c t and its corresponding p-values, sending them to a newly

ueated SM data set labeled C-hat.sd2.

3.3.5 Calculation of Hosmer and Lemeshow's H;

As desaibed in section2.2.2, tixed cutoff points based on the estirmted probabilities from the

fined model were constructed. In this study, 10 groups were created based on die pre-specified cut-

points:

[O.O,O.l), [0.1,0.2), [0.2,0.3), [0.3,0.4), [0.4,0.5), [0.5,0.6), [0.6,0.7), [0.7,0.8), [0.8,09), [0.9,1.0]

Elsewhe~, has been c a l d t e d based on h e r cutoff points when no escimnted probabilities fd

in certain intemals (Hosmer et ai., 1997). For feasib* and continui y, it was decided to always use the

10 specified gmups for d y s i s on the genented data sets. Once again, the foliocuing options in

BOLD CAPS mut be spesed

pmc logistic data = infile desceading; model y = x ; OUTPUT OUT = OUTFLLEl PRED=PREDICTD; m;

Anothu new 6le is mated which calculates H: and stores its values and thek conespondhg p-values.

Further de& can be found in the appendix. In situations where no e s h t e d pmbabiiities f ' into

certain intervals, ten groups were stili utilued to caicuiate fi:, even though f m r than ten groups bave

been adopted by othen (Hosrner et ai., 1997). The values of fi: and heir coxresponding p-values

wue stored in a separate SAS data set named H-hat.sd2.

3.3.6 Calculation of le Cessie and van Houwelingen9s fk

As aliuded to in section 2.4.1, an easy to apply SAS macro was used in calculaMg cc. The

acnial SAS macro (FI'ITEST) c m be found in the appendix. The macm must be used in conjunction

with the SAS procedure LOGISTIC (PROC LOGISTIC). The foilowing options in BOLD CAPS

must be speci6ed

proc logistic data = infile COVOUT OUTEST = BETAS descending; model y = x ; OUTPUT OUT = OUTFILE1 PRED = PREDICTD; m;

The OUTEST option mates the new data set BETAS, which contains the estimates of the regression

coefficients. When speafging the COVOUT option, the estïmated covan'ance matex is added to the

newly created BETAS data set The following statements invoke the SAS macro FITïEST:

%FI?TEST(BETAS = BETAS, PRED = PRED, DEP = y, VAR = x);

52

The output would be sent to the SAS OUTPUT window, however, again the same methods which

were used to extract the Pevson and derriance statistics (using PROC PRLN'ITO) wexe used to extract

$ and its corresponding p-values, scndiag them to a newly aeated SAS data set caiicd T-hat.sd2.

3.4 Mastet data set used for cornparisons

A h the computations were completed, a SAS master data set was aeated by mer& the six

sepvate SAS data sets that were stored (Pearson.sd, deviance.sd2, B m s d 2 , H-hat.ed2,

C-hat.sd2, T-kt.sd2). The new rnastet file was aeated in SAS format (mastet.sd2), and compxised

of the foilowing k i f d o n .

C a s X ? X z D D TB TB CI Cc Hg H g Ti, Ti, e sucobsXZ P Rej D P Rej TB P Rej Ç P Rcj HI P Rej Ti, P Rcj p o B 1 P i o p Y V d 1 20 1 9 . 9 -42 O 8-7-41 O 3.1.19 O 9.8.23 O 8.5 .38 O 0.7.23 O O 1 Wi 1 1 20 2 8 . 2 . 5 2 O 8.3.53 O 2.5.29 O 7.5.46 O 6.1 .52 O 1 .138 O O 1 Hi 1 1 20 3 8.1 -52 O 8.7 .41 O 3.1 .19 O 9.8 23 O 8.5 3 8 O 0.7 23 O O 1 Hi 1 1 2 0 4 7 . 2 . 5 8 O 8.3.53 O 2.5.29 O 7.5.46 O 6.1 .52 O 1 .138 O O 1 Wi 1

3.5 Methods

3.5.1 Proportion

for comparison

Rejected, P-values, Percent Agreement

In order to determine the effectïveness of the goodaess of fit test statïstics, the aforementioned

simulation resulo were used to make compacisons and conclusions on these tests. Proportions of

rejecting the null hypothesk (a = 0.05) of an underlying logistic mode1 were ucamiaed. The rejection

m m of the goodness of fit tests were iavesrigated across sample sues of the genulted data and auoss

prevaience rates of the outcome variable. The rejection rates were also exvained under the different

configurations of the va&xes and distributions of the predictor variable. In theory, since

a = P(rejH, 1 Ho is me) , one would expect the proportion of rejections to be roughiy 5% when

a = 0.05. Multiple logistic regession aras the primary method of d y s i s , with the 4 main effects and

111 fint ordu interactions as the predictor varilbles in the modeL For example, for Hosmer and

Lemeshow's t i , the esrirmted logit is p m by the foilowing urpressioa:

rejects Ho : adequate logistic fit ( X = x, ) = iî, (x, )

3 logit (& (2, )) =

Br x (prevalence y, = 1) x (sunpie six) + B, x (sample size) x (prevaience of y, = 1) + Pb31 A A

8, x (sample size) x (distribution x) + fl, x (variance x) x (prevaience of y, = 1) + B,-, x (distribution x) x (pzevalence of y, = 1) + b,, x (variance x) x (àismbution x)

whtxe ail of the predictor vaziables were considered CO be caregoricd and the parameter eshates

for the intexaction vn8?bks represent multiple estimates

54

As tables 3.1 and 3.3 oudine each case of the generated data, cases 1 and 3,2 and 4,s and 7,6

and 8, etc.. . were combined since they have the same properties when looking at dl the factors. For

example, cases 1 and 3 both have the predictor variable coming from a nomal distribution with a

v k c e equaî to 1, and prevalence of outcome variable being high. Therefore, for the multiple logistic

regression anaiysis, there are a d y 60 scedos , comprisiag 12 pairs of cases combined x 5 sarnple

sues x 500 replicates = 60.000 observations. nie multiple logistic regression was performed by using

the SAS procedure GENMOD. Wrelihood ratio chi-squared statistics (SAS Institute Inc., 1397) were

dispiayed to obsenre trends in factors affecthg rejection rates for the goodness of fit tests. The chi-

squared values are more mevllngful than p-values because of the iarge number of observations.

The secondvy rnechod of analyzing the results from the univariate logisac regression

involved sumrnanziag the propohon of rejections over the 500 replicates for each case, which

results in 120 obsemations (24 cases x 5 sample sizes). ni is method looked at ali 24 cases rather

than plieing 12 cases. Another advantage of s u m m e g the data over each set of 500 replicltes is

chat the p-values would be more memùrgfd, skicc this approach would be based on only 120

observations. The continuous outcome variable for this malysis of variance approach was defined

to be the fiequency of rejections divided by the number of replicates, with the same predictor

variables specified in [3.3]. Inspection of the p-values and F-values enables contrasts to be made

with the results from the mulaple logistic regression analysis. The anaiysis of variance was camied

out by using the SAS procedure GLM, and the results were displayed in tables.

The pexcentages rejechg the null hypothesis of an adequate logistic mode1 fit for the 6

goodness of fit tests unda study wue plotted to illusate the rejection rate aaoss each level of size

of the genulted data set, for the dismbutions and variances of the predictor variable, and

prevalence of the outcome v ~ b l e . This aiiows cornparisons of plots to be made visually in order

to detect trends and possible interactions between the factors affectlig the rejection rates.

Although most reviews have examined the proportion of goodness of fit statistics rejealig or

fuluig to reject the null hppothesis of a good logistic model fi& chme does not appear to be much

mention of the masures of Igreement between these goodness of fit tests. The easily calculated

percent agreement was dculated for each pair of the 6 goodness of fit tests, as well as for aii 6 tests

simultaneously, indicating when ail of the tests rejected or fded to nject the nul hypothesis.

The percent agreement is debed as:

aumbes of concordant observations PA = x 100 total number of observations

After the percent agreement was calculated, it was stored in the master data set (master.sd2). The

expected percent agreement was ais0 calcuiated and compared to the obsemed percent agreement.

Anaiysis of variance was then applied to the resulting data set to detennuie if the psedictor variables in

[3.3] would affect the outcome variable, percent agreement. between each pair of the 6 goodness of h

statistics. Once agatn, since chue were ne& 60,000 obsemations, the F-values nthu than p-values

wue displayed since they are more meaninghil than the p-values.

3.5.2 Power

The power to detect an inconectiy speded link fùnction of the 6 goodness of fit tests was

investigated. Ho k the nul hypothesis that the logistic link fhction is appropriate, while HI is the

dtermtme hypothesis that the logistic link funaion is not approptiate. The power of the goodness of

fit test is dehed as Power(A) = P(rej H, 1 H, is m e ) .

One option in investigating the power is to generate data under an alternative model, puforni

logistic regression on the generated data, and d e t d e how ofien each goodness of fit test rejected

the nul1 hypothesis of an adequate logistic model. r\np altemative model that is bounded by O and 1

can be used, such as uiy cumulative dismiution fhction. Similar to Brown's (1982) methods, three

56

alternative models were ucuniaed to assess the power using oindomly generated data sets with the

pseàictor variable having a d o m distribution oves the internai between -3 and 3. For each of the

thee altemative models, Broam solved for Bo and Pl which satisfied the folowhg equatioas:

which correspond to the mauimum likelihood conditions for an inhnite sample sie (Brown, 1982).

such that

X , (x) denotes the aitemative model

z(x) denotes the e s k t e d logistic model

F(x) represents the disoibution of the predictor variable

The followhg altemative lînk functions were chosen to deteimine the power of the goodness of fit

tests.

Alternative Link Functions

The cumulative nomial m~del, also d e d the probit model is s v e m c and has narrower tails than

the logistic model. Subsatuting 13.61 into equation [3.51 zedted in the solution of

Ba = O, pi = 1.785. Therefore, those values wue used in generating data to perform die powu

dculations. Figure 3.1 shows the nomial response model vusus the logistic response model, and

the potend difficulties in d ~ s a h k ~ g between the m o response modeis.

Normal vs. Logistic Distribution Response Models

Predictor Variable U(-3,3)

Figurr 3.1 - Compatison of the normal reaponae model vcniur the logistic ceapoasc model with Bo = O and PI = 1.785 and the predictot variabk coming h m a uiiiform distribution bctwcen -3 and 3.

The Cauchy distribution is ais0 syrnmemc with wider tails han the logistic model. nie Cauchy

distribution which was used had a location pararneter of O and s d e parameter of 0.5. Substituthg

[3.q into equation [3.51 resulted in the solution of B, = O, 8, = 1.3 18. Figure 3.2 danonstrates the

wider tails of the Cauchy response model versus the logistic respoase model.

Cauchy vs. Logistic Distribution Response Models


Figwe 3.2 - Cornpuifion of the Cauchy cesponse mode1 venus the logistic msponse mode1 Mth BO= O and PI = 1.318 and the predictor variable coming h m a uniform distribution bc tween -3 and 3.

The extreme d u e distribution, &O d e d the complexnentary log-log link lunction with location

parameta 4.3665 and sale pvvneter 1 was &O used as an aitemative Iuik function. The extreme

value dimibution is not symmetic. Substimting [3.8] into equation [3.51 resuited in the solution of

Ba = 0.267.8, = 1.479. Figure 3.3 exhibits the extreme value response modd versus the logistic

response mdeL

Extrema Value vs. Logistic Distribution Response Models .


Figure 3.3 - Cornparieon of the E m m e Value rcsponse mode1 venus the logisac msponse mode1 Mth PO = 0.267 and PI = 1.479 and the pndictor variable coming ltom a unifom distribution bctwcen -3 and 3.

The same methods that were used to generate the data kom the k t 24 cases were &O used to

generate the data for the power anaiysis. Five hundred replicates of data sets of sizes 20, 50, 100, 200,

and 500 were generated. For example, a h gmentiqg a uniform predictor variable beiareen the

intemals -3 and 3, the response variable for the urneme vdue distribution was calcuhted by the

folowing equation:

60

Logistic regtession was applied to the generated data, and the proportion of rejectioas was

calculated fbr each goodness of fit test, indicaMg the powu for each test.

The power anaiysis in this pape was oot the rmin f m s of attention, thus only power to detm

alternative fiinctions were exambled. Various types of power calculations such as omission of

important predictos variables, interaction temis, or transformations of vaziables could have been done,

however those power anaiyses were aot perfomed hue.

Chapter 4

Simulation Results

4.1 Maximum Likelihood Convergence Problems

In attempting to fit a logistic model, there are situations whue convergence is not achieved,

and no information is obtained on a patticulv set of data. The h t situation in which there is no

convergence occurs when ail of the observations have the same response, and chmefore no stadstia

are computed. The second problematic sceMo occurs when there is a complete separation in the

sample points, in which case the maximum likelihood estimate does not & t In SAS version 6.12, the

LOGISTIC procedure continues to give parameter estimates based on the iast maximum Wtelihood

itemtion men when thae is a complete separation in the sampk points. Version 6.10 of SAS does not

proceed to give parameter estirnates.

Of the 24 cases, univariate logistic regession was pedomied on 60,000 data sets. Parameter

estimates were obtained in 58,303 (97%) data sets. Most of the problems oBgiaated €mm the s d e r

data sets, e s p e d y those of size 20 with a low proportion of ment rates (cases 9-12 and 21-24). In

62

fa- 87% of data sets of size 20 were able to achieve maximum Likelihood convergence, and parameter

estimates, whereas 99% convergence was achieved for data sets of size 50.

4.2 Proportions Rejected fiom Goodness of Fit Statistics

The percentages of rejecting the nul hyporhesis of an underlying logistic mode1 when the

n d hypothesis was m e fiom the 6 goodness of fit tests are disphyed in table 4.1. The data were

genented under the logistic model, and thus the proportions of rejections would be defined as the

type 1 enor (denoted by a), which is made if the n d hypothesis of an underlying logistic model is

rejected when the nuil hypothesis is me. Since the n d hypothesis is rejected if the p-value < 0.05,

the rejection rates should be approximately 5 percent, if the goodness of tit tests did have th&

postulated chi-squued dismbutions.

Multiple logistic regression was performed on the resuits fiom the univariate logistic

regression, t e s ~ g the nuli hypothesis that each factor does not predict the rejection rates for each of

the 6 goodness of fit tests, while a c c o u n ~ g for each of the other factors in the model. The

likelihood ratio chi-squared values are displayed in table 4.2. This will aliow mnds CO be seen and

maptudes of effects to be observed.

Table A.1 in Appendk A displays the p-values and F-values fiom the analysis of variance

pedonned on the results fiom the univariate logistic tegression. The proportion of rejections

sumrarized over the 500 repkates for each case was the c o n ~ u o u s outcome variable, which

resdted in 120 observations.

Figures 4.1 through 4.6 display the percentages of rejecting the nul1 hypothesis of an

adequate logistic model fit for the six goodness of fit tests under study. The plots iüusate the

rejection rate aaoss each level of size of the generated data set, for the dismbutions and variances of

the predictor variable x, and prevalence of the outcome variable.

63

Table 4.2 indicates that size had the swngest effect on the goodness of fit tests. Of ail 6

@ess of fit tests, Brown's scoze Ê and the deviame D were the least affected by the size of the

generated data set (chi-squued = 56.8, 60.0 respectively; d i = 4). This is also obsemed in table 4.1,

* whue the rejection rates of B appear to be steady aaoss sizes. For D , chue are numerous scenatios

whue it never rejects or only rejects the ndi hpothesis 1% of the tirne aaoss sizes. Pearson's X'

and le Cessie and van HouweLngen's cc were the most affected by sample size (chi-squued = 835.1,

771.2 respectively; df = 4), Mth steadiiy inaeasing rejection ntes with increasing sample &es. Hosmer

and Lemeshow's 2; and fi; also display simil?ir trends, but not to the same extent as X' and

(chi-squared = 276.7, 210.1 respectively; df = 4). Table A.1 confinned the above condusions. The

results fiom table 4.1 also show h t in most situations, Brown's score was the only test statistic that

rejected the ndi hypothesis neu the theoreticai 5 percent for data sets of sampk size 20.

The only test statistîc that was subsmtiaily affected by the prevalence of the outcome variable

was D (chi-squued = 638.0, df = l), with more rejections occming in the! b h e r prevalence rates.

also had k h e r rejection ntes at &ha. prevaience rates, but it was not as subsmtiai as it is in D

(chi-squared = 21 3.5, df = 1). The analysis of vatiance results fiom able A. 1 &O indicate that D and

fi; are the most affected by prevaience of outcorne, but it shows a greater effect for H: ratha than

D.

The rejection ntes from the goodness of fit tests were essentdly unaffectcd when the v-ce

of the predictor variable changed bemeen 1 and 2 The only test whidi reported a chi-squared value

above 10 was D (chi-squared = 41.7, df = 1). In relative t-s, D did have a much lvger chi-squued

value than the other 5 tests, wïdi more rejections occueriog when the predictor variable had a o h c e

of 1. Table A.1 shows that the variance of the predictor variable affects the tejections of both and

64

x . However the condusions from table 4.2 ovenide any diffaences berneen the two tables. The

results fiom table 4.1 appears to agree with the results fiom table 4.2.

The disuibution of the predictor variable did not gready affect the rejetions of the tests.

HoweVer, in relative ternis, fi: had a auch higher chi-squued vaiue than the other goodness of fit

tests (chi-squued = 50.6, df = l), 4 t h more rejections occuning when the predictor variable followed a

normal distributioa as opposed to a d o n n dismbution.

Most f k t order intuactions had Little or no effect on the rejection rates in the goodness of fit

tests. The interacaon between sample site of the generated data and prevdence of the outcome

variable had the most effect, Ilthough it was not neariy as sgmfïcant as some of the main effects

discussed eaflier. was the most affected by this interaction effect (ch-squued = 95.9, df = 8).

Figure 4.1 indicates there may be an intuaction effect, as the h e s do not appear to be going in the

svne direction; that is they are not approxirmtely paraiid. h w e r rejection rates are seen in smaller

sample sizes with low prevalence of the outcome variable, but as the sample sue hueases, the

rejection rates are higher in the data that have higher prevalence in the outcome variable. The

interaction is not too extteme in all situations, as d the rejection rates inaease with sample size.

The other notable effect is seen with the interaction berween sample size of the generated dan

and the distribution of the predictor variable in the rejection rates of D (chi-squared = 110.3, df = 4).

If the bottom two plots in figure 4.3 are overlaid onto the top two plots, one cm see that the ovedaid

lines are not parailel at aü. The diverging h e s are best seen when looking at the rejetions of dan

ha* a high prevalence of outcome variables, as the rates &e dramaticalip more in the d o m i

distribution compared to the same lines in the data with the predictor variable coming from a normal

distribution.

65

As indicated by table 4.1 and figures 4.5 and 4.3 respectively, it would appear as though

thete is something inherently wrong with and especially D, as they do not seem to reject as

dose to 5% as one wodd have anticipated, especially for data sets of size 20 and 50. A closer look

reveds that if the 24 cases are inspected individuaiiy, in 17 of the 24 cases, in at least one of the size

categoties, D rejected the n d hypothesis less than 0.5% of the tirne, u s d y when the prevaience of

outcornes aras low or moderate. fi: rejected the n d hypothesis less than 0.5?/0 in 6 of the 74 cases.

It is interesting to note that for & , the percent rejected at ske 100 is a c d y higher than the

percent rejected at sizes 200 and 500, when the predictor variable folowed a d o r m distribuaon.

In explothg the behaviour of X' under the nomial distribution portion of table 4.1, in ail 6

compvisons from sample size 20 to 500, the rejection ntes were steadily inueasing. Since the

rejection ntes did not level off at data sets of svnple size 500, it is not known whethet the rejection

rate would continue to rise beyond data sets of sample size 500.

1

The percent rejected Liueased steadily with the size inaease for T, , e s p e d y in the n o d

duaibution of table 4.1 4th the prevaience of outcome variable bekig low and moduate. Similar to

x?, but not as severely, the steadiiy increasing rejection rate of approached 10% at size 500 in

three scenarios, which is twice as aumy rejections as the theoretical type 1 enor.

Table 4.1 - Percenages of rejecting the null hypothesis of an adequate logistic model fit. For each of the six goodness of fit statietics, each lcvcl of size of the generated data set, distribution and variance of the pccdictoc variable x, and pmvaknce of outcome wm examincd undet the generatcd data. The rcsults are based on 5 8 9 3 observations that are collapsed in this table.

-

Ptcvdcnce y = l

Low

Moderate

High

LQw

Modetate

Higb

n .

20 50 100 200 500 20 50 100 200

L

500 20 50

' 100 200 500 20 50 100 200 500 20 50 100 200 500 20 SO 100 200 500

Goodness of fit Test

x2 fk 3 Oh Rejecting: HO

D

1.6 4.5 7.9 7.6 10.9 2.6 6.9 8.3 9.1 14.0

0.0 0.5 1.8 -.- 7 9

4 2 0.3 2.6 2.8 3.6 3.9

0.4 1 .O 4.5 6.3 10.8 1.4 3.3 3.7 5.0 10.4

C; 8;

1.7 3.6 3.6 4.3 8.0 0.6 1.3 4.7 7.1 8.5 0.7 3.8 4.4 4.1 6.2 1.6 3.1 4.8 4.1

1 4.7

-.- 7 3

2.5 4.5 3.9 5.1 4.1 4.7 5.0 4.2 6.0

1.4 4.1 4.4 3.9

, 4.7 0.0 0.8 1.7 3.0

44 4.4 5.1 5.1 6.1 1.4 3,O 4.6 3.5 5.1 3.3 3.7 4.4 4.6 4.6 3.3 3.6 5.1 5.0 4.9

0.0 0.0 0.0 0.0 0. 1 0.8 2.0 1.3 2.5 1.6

1 2 . 4.1

5.6 8.6 10.7 0.9 3.8 8.9 8.4

1.5 2.3 4.6 6.0 5.4 3.2 5.1 5.2 4.4 5.6

2.2 3.5 3.7 2.5 2.0 0.0 0.0 0.0 0.0 0.0 0.1 0.3 0.9 0.6 0.6 0.4 1 .S 1.1 1.1 0.6

3.0 0.9 3.0 3.6 3.7 4.8 0.8 2.8 4.6

1.9 3.6 5.5 4.0 -

4.9 1.5) 3.5 4.6 6.0 7.0 2.4 3.7 4.8 5.2 8.3 1.9 5.0 5.1 4.6 4.7

11.0 2.5 7.3 8.7 11.7 12.0

. 1.4 5.6 7.5

41 1 9.8 4.7 1 178

Table 4.1 coi inued 1 1 Goodncas of fit Test

C

Prcvaience fk B D y = l n

c; &; X' % Rejecting Ho

20 0.0 2.1 0.0 1 1.2 ' 0.0 0.8 50 1.5 2.9 0.0 2.7 0.0 4.9

Table 4.2 - Lüccühood ratio chi-squarcd values for terting the auil hypothesis of each factor predicting the tcjectiona (of the nuU hypothtais for of an adequate logistic mode1 fit) foc each of the aix podneir of fit tests. The factors examincd included the size of the gcnerated data set, diettibution and vhance of thc picdictot vlnable x, ptcvdence of outcorne, and d 6mt otdcr intemctions. The tcsults ut bmed on 58-3 obsetvations.

Factor

Goodness of Fit Test

fk È D df . c; a; I X' Chi-aauared Value

Percent Rejected T-Hat with X-N(O.1) -

0 LOW Prevabnca Y -4- Moderate Prevabnœ Y - 9 - High Pmvalsnœ Y

Sample Size of X

Percent Rejected T-Hat with X-U(O,1)

6- Moderrte Pmvatanca Y

20 50 100 200 500 Sample Size of X

Percent Rejected T-Hat with X-N(O.2)

-4- Modente Ptevalsnca Y

V --

T r-- T

20 50 100 200 500 Sampfe Size of X

Percent Rejected T-Hat with X-U(0.2)

-4- Modente Prevalence Y

w

20 50 100 200 500 Sample Size of X

Figure 4.1 - Percentages of reiecting the nuU hypotheris of ln adequate logistic mode1 fit for k Cemie and van Houweliagcnss j* goodnese of fit test rtatiitic acmss ench Iewl of size of tbc generated data set, for the diraibutions and v a n ~ n c e ~ of the ptedictot variabk 5 and prevalencc ofoutcorne variabk. Tbe horizontai iinc rcpreseno the 5% reiection mtc.

Percent Rejected Brown with X-N(0,I )

Moderate Prevaience Y

.B

20 50 100 200 500 Sarnple Size of X

Percent Rejected Brown with X-U(O,1) * - LOW Provaionce Y -a- Moderate Provalence Y -.O - High Pisvaionce Y

20 50 100 200 500 Sample Size of X

Percent Rejected Brown with X-N(0.2)

-a- Moderate Prevalence Y

20 50 100 200 500 Sample Size of X

Percent Rejected Brown with X-U(0,2)

+ LOW Prevalence Y I

20 50 100 200 500 Sample Site of X

Figure 4 3 - Percent~gcs of mjecting the nuil hypothesis of an adequate logistic mode1 fit for Brown% Score Ê goodncss of fit tert rtatitstic acnns cnch lcvel of size of the gcncrated data set, for the distributions and variances of the pmdictor variable x, and prevaicncc of outcome variabk. The hoiizontd iine mprrsents the 5% tcjection rate.

Percent Rejected Deviance with X-N(O.1)

-0- Modsrate Prevalence Y

o J n - O I

20 50 ioo 200 500 Sample Sue of X

Percent Rejected Deviance with X-U(O.1)

+ Low Previlence Y 4- Modemte Pievalence Y I o

Sample Size of X

Percent Rejected Deviance with X-N(0,2) - ch Low Pmvalenca Y -4- Moderale Pnvalsnce Y - 9 - High Prewbnœ Y

20 50 100 200 500 Sample Sire of X

Percent Rejected Deviance with X-U(0,2) 0 Law Prevalencs Y -a- Moderate Pievalence Y O

--O - Hi@ Prevalence Y

20 50 100 200 500 Sample Sire of X

Figure 4 3 - Percentages of cejccting the nuil hypotbesis of an depuate logistic model 6t for: the deviancc D goodness of fit tcat rtntistic across cach kvel of aizc of the gencrattd data set, for the dintributiona and variances of the prcdictor variable x, and pmaience of outcome variable. The horizontal üne ceprescnu the SOh rcjection rate.

Percent Reiected C-Hat with X-N(O.1) +- Low Prevalenœ Y 1

20 50 100 200 500 Sample Sue of X

Percent Rejected C-Hat with X-U(O,1)

.a- Moderato Provalsnce Y

- 20 50 100 200 500

Sample Size of X

Percent Rejected C-Hat with X-N(O.2)

-6- Modente Prevaknce Y

20 50 100 200 500 Sample Sue of X

Percent Rejected C-Hat with X-U(0,2)

-a- Madente Prevalence Y

20 50 100 200 500 Sample Size of X

Figure 4.4 - Percelitages of tcjecting the nul1 hypothesis of an adequate logistic model fit foi

Hormer and Lemeshow% goodiieu of fit teit statistic accore each kvel of size of the

geneiated data set, foc the distributions and variances of the pmdictor variable X, and prcvaience of outcome vanable. The horizontai üne mpmsents the 5Oh mjecWn rate.

Percent Rejected H-Hat with X-N(O,1)

-6- Modente Prevalence Y

Sample Sue of X

Percent Rejected H-Hat with X-U(0,t )

.a- Modente Plavalence Y

d - O J A

- O

20 50 100 200 500 Sample Site of X

Percent Rejected H-Hat with X-N(0,2) -0 L w Pnvaknc. Y

v

-4- Moderats Prevalence Y -O Hi h Prevalenca Y

6=

20 50 100 200 500 Sample Sue of X

Percent Rejected H-Hat with X-U(0,2)

-0- Moderate Prevalenœ Y

20 50 100 200 500 Sample Sire of X

Figure 4.5 - Petcentages of rejecting the nuil hypothesis of an adequatc logiatic mode1 fit for Hoamcg and Lernerhow% fi; ';goodness of fit tert statirtic ncmsc each kvel of sizc of the

generated data set, for the disaibutions and variances of îhe pcedictor vpriabk x, and pcevaienct of outcomc v ~ a b k . The horizontal üne cepcesene the 5% mjection rate.

Percent Rejected Pearson with X-N(0,I) - ~ o w Pmvaknœ Y- -4- Moderata Pmvaknce Y I

20 50 100 200 500 Sample Size of X

Percent Rejected Pearson with X-U(O,I )

a- Modemte Pmviiance Y

20 50 100 200 500 Sample Sire of X

Percent Rejected Pearson with X-N(0,2) LOW Pievaienœ Y

-

'

4- Moderate Pmvalenca Y l2 - -0- High Pievalence Y

Sample Size of X

Percent Rejected Pearson with X-U(0,2)

-4- Moderato Pmvalance Y

O

20 50 100 200 500 Sample Size of X

Figure 4.6 - Pescntages of mjecting the nul1 hypothesis of la odequate logistic mode1 fit for Pearson% chi-equarcd X' goodnesr of fit test atatiitic acrcwc each kvel of sizc of die gencrated data set, for the distrib~tions and variances of tbe pmdictor variable X, and prevlknce of outcomc variable. The horizontll fine reprerents the 5% rejcction rate.

4.4 Special Cases

Althougb evuy situation codd not be investigated, two gkaag omissions fkom the h t 24

cases were data sets with a predictor variable having a much Lrga vatimce and a skewed distribution.

Data sets fiom thme additional scemuios which were labeled cases 25 through 27 in table 3.3, having a

ptedictor v k b l e with a chi-squued distribution were generated to see if the goodness of P tests

perfomd simihtly to the 24 cases. Two more addiaonal ases were considered 6 t h a latger variance

of 5 and 25 in the predictor variable. These cases were iabeled 28 and 29 respectively Li table 3.3.

which were comparable to cases 5 through 8, with the exception of the larger v-ce. The data

generated fiom case 28 with a variance of the covariate of 5, while use 29 generated data sets with

predictor variables having a variance of 25.

4.4.1 Chi-Squared Predictor Variable

There was a high proportion of nonsonverging maximum kelihood estirnates in data sets of

sample sue 20 when the outcome variable had a low pievalence. To pievent any biases fiom

occurrlig, d of the data genented in that scenuio were omined.

The results tiorn table 4.3 did not differ n d i d y fiom table 4.1. One noticeable ciilfaence is

A

the much lower rejection rates in T, across ail sizes, which was not seen in my situation in table 4.1.

D &O hacl more rejections for the smaller sized data sets, comparable to those seen in smiilu

situations in table 4.1. The only other notable diffaence in table 4.3 is the constant rejection rate of

. In 1 1 of the 14 cells in table 4.1, the rejection rate a n s betweea 3.0 and 4.8, and o v d the

maximum rejection rate was 6.3.

Table 4.3 - Percentages of rejecting the null hypothesis of an adequate logistic mode1 fit. For cach of the six gwdnea of fit statirtics, the specid case of the predictor vviablc genecated with a chi-

Variance of x

1 Goodnese of fit Test

4.4.2 Ptedictot Variables with Higher Variances

Once again, there was a high proportion of non-convuguig maximum Wtelihood estimates in

data sets of sizes 20 and 50 in case 29, and thus they wae omitted. The most noticeable characteristic

of tables A.2 and A.3 is the la& of rejections of the null hypothesis for D . It is perplexlig that out of

4,000 data sets gmerated, the deviance statistic never rejected the a d hypothesis. The tenson for the

iack of rejections Gom the deviance is not reiated to the size of the variance of the predictor

variable, since similv rates aze seen in case 7 (table Au), and espedly in case 8 (table A.7) where

the de-ce never rejected the nuii hypothesis.

4.5 Percent Agreement between Goodness of Fit Statistics

Although cornparisons have often been made beoveen the rejection rates for various statistics,

additional information c m be acquired by investjgatuig the confo- amongst the statiscics. Table

4.4 demonstrates the overaii observed and expected percent agreement between each pak of the G

goodness of fit tests, while tables 4.5a through 4.5e illusate the agreement auoss sample &es.

Theoreucally, if all of the goodness of fit tests had rejected the nuü hypothesis 5% of the h e , as they

shodd have, then the expected percent agreement bemen any ~ r o tests would be 90.5%. This is th-

theoretical benchmark for the expected agreement between any two goodness of fit tests. The

expected agreement displyed in the tables are based on the percent rejected fiom the genented data.

For example, if two goodness of fit tests never rejected the nuli hypothesis, we would expect 1W/i

agreement beoueen them.

Looking at p-vaiues to attempt to establish the effecu of prevalence of the generated outcome

variable, the sample size, variance, and distribution of the predictor variable would not be verv

productive because of the neariy 60,000 observations. Therefore the F-values r e s u l ~ g fiom the

analysis of vaiance for tesMg the nul hypothesis of each factor predKMg the percent agreement lm

each pair of the six goodness of fit tests is presented in table A.8.

X' had the lowest agreement beween the othu 5 goodness of fit tests, however it also had

the lowest expected agreement based on the redts fiom the percent rejected. In fact, X' had the

highest percent agreement dacive CO the expected agreement. The k h e s t overail agreement occurrrd

* * benueen D and H; with an o v d agreement of %.O%, ffollowed by 95.4% agreement between Cg

and Tc. It is somewhat macipated that X' and D did not have higher obsemed and expected

agreement. Hourever, looking at the hi& rejection rates for X' md the low rejection rates f i t D

heips to explain this la& of observed agreement.

Of the 15 cornplrisons made beween the percent agreement of the goodness of fit statisacs,

12 had relatively marginal F-values (F-dues > 80) for a size effecb showkig the lower percent

agreement in die larger sample sizes. The remaining three cornparisons L d much smaller F-values for

a size efKect (?40 agiement becareen Ê and D, X' , and ÎI; ; F-value S 40 ). When taking into

considention diat chue were 15 compaiisons made, multiple cornparison methods would greatiy

reduce the magnitude of the effecu of the factors a f f e c ~ g the agreement becween goodness of fit

tests.

The krgest &op in percent agreement across sizes was seen between 3 and X' , where it

dropped kom 98.1°/o for data sets of sarnple size 20 to 88.3% for data sets of size 500. This &op is

also reflected by the large F-vaiue in table A.8 (F-value = 233.5). This is not surprishg since the

rejection rate for H: never reached 5Yo in any ce11 in table 4.1 while x ' was ofien over 10Yo in the

iarger sample sizes. Ail deches in percent agreement auoss sizes cm be seen Li tables 4.5a through

4.5e. The drop in percent agreement across sizes is due to the fact that at srnailer sample sizes, most of

the m e s s of fit tests did not reject very much, thus a c c o u n ~ g for more observed and expected

agreement.

The remaining effects in table A.8 werr not as notewoithy as the sample size effect. There was

a slight effect for prevalence of the outcome variable in 4 of the 15 pair-Wise comparisons. The largest

dedine in percent agreement was seen betweea the deviaace and fi:, where it went h m 98.7%

(expected % agreement = 98.7%) in the low prevalence category to 93.5% (expected O/O agreement =

92.8%) in the high prevalence category (tables A 9 through A.11). This change cm also be seen in

table A.8 where the F-due representing the effect of prevaience of the outcome variable between

and X' is the lvgest value in that table (F-value = 355.6).

Tabk 4.4 - O v e 4 obsccved and cxpected (in parendieses) picent agmcment between the cejcctions of ail pain of the 6 goodness of fit test~ foc di obsewirions combined

Tabk 4 . 5 ~ - Overaii obbseived and expectd (in parentheses) percent agreement betwecn the cejecciws of di p h of the 6 goodneas of fit tests when the geaented sample aize is 20.

GOF Test

Tabk 4.Sb - û v e d obaerved and u<pcctcd (in parenthesefi) percent agreement benmn the mjections of dl pairs of the 6 goociners of fit tests when the gencr~trd sample sue is 50.

B fk

I

95.1 (34.9)

94.9 (92.8)

95.5 (94.4)

GOF Test

h LI

t ; I

H;

H; D

I X2 I I I I I

fi; D

95.3 (34.8)

96.7 (96.6)

fk

96.6 (94.0)

96.4 (96.1)

96.5 (93.9)

96.3 (95.6)

2;

2;

35.0 (94.4)

95.0 (92.8) 94.0 (91.7) 93.4 (93.2) 94.0 (33.7) 93.8 (91.6)

Tabk 4.k - Ovemil obrcnred and expectcd (Li parentheses) percent ngmmeat betmen the rejcccions of di p h of the 6 gocdness of fit tesm when the genentcd sampk size is 100.

GOF Test f, B D 6; H; A

Tabk 4.M - O i n d obseived and expected (in parcntheics) percent agreement between the cejectionr of DU paim of the 6 @neas of fit tests when the genetaicd rampk aize is 2ûû.

Tabk 4.k - 0 v e d obicrvcd and cxpccted (in parentheses) petccnt lgmmcnt ktwcen the rejections of d pPim of the 6 @neas of fit testa whcn the genemted sampk size is 500.

GOF Test

k D

k; 2; X'

Table AS2 summnrixes the percent agreement between ?II 6 goodaess of fit tests

simuitaneously. That U, the percentage that each of the tests reject or f d to reject the null

t e

94.0 (91 .O)

93.2 (92.2)

95.5 (90.4)

93.3 (92.3)

91.7 (87.1)

H;

k

93.4 (92.8)

93.4 (90.0)

94.0 (92.9)

90.9 (87.6)

e; GOF Test ,

k fk

90.5 (88.2)

H;

89.2 (88.7)

D

93.0 (92.1)

94.7 (94.1)

90.8 (88.6)

e;

93.2 (92.2)

91.1 (87.0)

B D

81

hypothesis simuitaneously. The F-dues resuiting from the analysis of vachce for testing the null

hypothesis of each factor p r e d i c ~ g the percent agreement for all six goodness of fit tests are

prexnted in table A.12. The results and trends were similar to those seea when aamining each pW

of goodness of fit tests. Once again, as expected, the obsemed and expected percent agreement was

invusely related to the sample size of the generated data. Table A12 shows the steady dediae

auoss sizes fiom 93.4 to 80.3 and table A13 shows the strong effect of sample size (F-value =

244.8). There is also a very shght effect of prevalence of outcome seen in table A 1 3 (F-value =

60.2), but as the magnitude is not meaninghil.

4.6 Percent Agreement in Special Cases

The percent agreement in tables A14 through A l 6 appevs to be slrghtly higher chan the agreement in

cases 1-24, however the inaease is not meuiiaghil. in conclusion, there are no notewonhy diffuences

seen in the percent agreement in the speaal cases.

4.7 Power

There was a hgh proportion of non-converging maximum likelihood estimates in data sets of

sample size 20 when pufomillig the logistic regression on the data generated for the power

calnilations. Once more, to prevent any biases fiom ocnurùig, aü of the data generated in that

scenazio were omitted.

The powet to detect alternative ILik functions for each altemative iink and each size are

displayed in table 4.23. When attempàog to detect the n o d model, ail of the goodness of fit tests

had extremely low powu. This is not surprising since @e 3.1 illustrates the difficuity in aying to

discriminate between the logistic and the n o d rcsponse mdels. However, it was somewhat

surpEsing that H; demonstrated the highest power, since it had very low rejection ntes in the

simulation study.

Table 4.6 - Powm to deecct a aiternative link hinctions with prrdictor vdable X-U(-39)

In attempting to detect the Cauchy modeis, dl of the goodness of fit tests demonstrated th&

GOF Test

i, I D

r e; H; X'

highest parer, cornpared with the other aiternative iink hctions. B had extremely high power of

nearly 87%. In ha, c, X' , and ?; al had power above 70°/0. Once again, this is not surprishg

when inspecting @e 3.2, as it is not as difficuit to disaiminate between the Cauchy response and

Alternative Link Function

the logktic response model with Bo = 0.267 and Pt = 1.31 8.

The third situation that deals with attempting to detect the extreme value model, which gives

, Normal

resdts that lie between the results 6rom d e t e c ~ g the normal and Cauchy altemative links. Once

50 0.2

0.2

0.0

0.7

1.3

0.4

again, k dernonstrated vastly supezior power relative co the other goodness of fit tests, Mth a powu

Caucby

of almost twice the size (#?'O) of the nearest goodness of fit test ( H: power of 27%). Figure 3.3

100 0.6

1.4

0.0

1.0

3.0

1.4

50 14.5

31.9

0.2

17.9

3.0

23.1

Extrtme Valut

illusates the abïliy to discriminate beween the extreme value response and the logistic response

widl P o = 0.267 and Pi = 1.479.

50 4.1

6.5

0.0

5.5

3.1

6.9

200 0.6

0.8

0.0

1.0

4.8

1.0

10 31.4

36.8

0.6

27.6

8.0

38.2

500 0.6

1.4

0.0

1.0

8.0

1.0

200 47.0

53.6

2.2

39.8

14.0

53.8

Un, 22.4

43.8 1

0.0

33.0

27.4

20.0

100 9.0

10.2

0.0

8.4

6.0

12.4

500 78.2

86.8

34.4

71.6

45.4

71.4

200 13.0

16.0

0.0

13.6

9.2

14.4

Chapter 5

Application of Goodness of Fit Tests

to Real Data

Patients with Systemic Lupus Erythematosus (SLE) have been folowed in a prospective

mamer at the University of Toronto Lupus Chic since 1970. Chcal, Lboratory, and thenpeutic

information has been collected on a smdard protod, and stored on a cornputer Lubase. In a study

to detemine the nanual hisrory of hypercholesterohemia in the b t 3 yevs of disease in an incepcion

cohon of patients with SLE (Bruce et 11, 1999), multiple logisac segression was applied to detenaioe

the best predictors of susPiaed hypercholesterohemh. S e v d predictot variab1es were postulated,

and in fact three variables were found to be @ m t l y associated with the prevalence of sustained

hypacholesterolaemia. As a subset of the anal+ in the desaibed snidy, one of the predctor

v h b l e s wbich was found to be s@cantly assocjated with sustained h ~ o l e s t c ~ o l a e m i a , age of

Figute 5.1 - Histogram of Age of Onset of SLE

onset of SLE, was used in a univariate logistic regression modd The disaibution of die predictor

vaiable age of onset of SLE is iiiusated by p p h 5.1.

The proportion of y = 1 (sustained hypercholesterolaernia) was 0.4. The sampie size of the

&a set was 134. The mean of the predictor vaiable, age of onset was 34.8 yevs of age, the median

age of onset was 32, the standard deviation was 14.5, with a minimum age of onset 13.2 years, and a

maximum vaiue of 83.1 years. The raw data for the age of onset were accunte to within four dePrml

points. Therefore, in order to apply these data to a binomiai logistic tegression mode1 h?ving fewer

covaziate patterns than observations, the data were rounded off to one deamal point. The roundhg

tesulted in age of onset having 40 c o v k t e patterns.

The data set fiom the University of Toronto Lupus Clhic used in this analysis most dosely

resernbles the data h m case 29 (ske 100); they both have relativeiy large van?inces, akhough ;ige of

onset of lupus is not n o d y distributed, but it k not heavily skewed.

Table 5.1 displays the p-dues, parameter estimates, and likelihood ratio test ( testhg

Ho :pi = 0) for the in id d y s i s , whexe the obsemations are grouped to form 40 covaxiate

85

patterns. Five out of six goodness of fit tests indicated that the fitted logistic mode1 was not a Door

fit, wbile the deviance statistic was the only one to indicate a poot fit.

Tabk 5.1 - P-VaJues for goodness of fit term applied to the byperchokstemlaemia in SLE study with 40 covariate patterns

1 GOF Test 1 P-vdue 1 Resdt

fk 1 1 Do not Reject Ho 0.2836

-2ImgL = LilPîihood ratio test tcsting H& = O

D

8;

At k t glance, the results of the deviance seem puzzhg, e s p e d y considering the iack of rejections in

s e v d situaaons. Moreover, the deviance did not reject at ?II in 4,000 data sets with lvgu variances

(cases 28 and 29). However, the number of covariate patterns does not appear to dow for as much

replication as needed for the deviancc D and Pearson's chi-squved X' co have an approximate chi-

squared distribution with J - p - 1 degrees of frcedom.

0.0075

0.6908

In an attempt to conficm the reason for the disagreement in the depiuice test statistic with

the other goodness of fit tests, the data were mudified to yield more replications in the predictor

variable. Age of onset of lupus was rounded off such chat ages w a e grouped by 5-year iatmds,

resultiag in 12 covlolte patterns, rather thm 40. The results of the p-dues korn the goodness of

fit tests cm be seen in table 5.2.

Do not Reject Ho

Do not Reject Ho

Table 5.2 - P-VPlues for gwdness of fit arts applied to the hypercholcstemhemia in SLE study 4 th 12 covariatc patterns

1 GOF Test 1 P-vaiue 1 Resdt 1

I ri; 1 0.7824 1 Do not Reject Ho

It îppears as though the additional grouping has allowed D to fiil to reject the n d hypothesis.

--- -

Do not Reject Ho

Do not Reject Ho

Do not Reject Ho

Do not Reject Ho

fk

k D

ei

Surprisingiy, the p-value for X' was essentiaily unchanged. le Cessie and van Houwelingen's

0.2965

0.9 1 06

0.0989

0.691 8

and Hosmer and Lemeshow's (?; also did not diffa fiom the initial analysis. h and both had

k h e r p-values than in the analysk with 40 covariate patterns.

In order to detemine the effects of funher grouping, the data were modified yet agaLi to

yield even more replications in the predictor variable. Age of onset of lupus was rounded off such

that ages wexe grouped by 10-year intwals, resulting in only 6 covviate patterns. The results of the

p-values €rom the goodness of fit tests cm be seen in table 5.3.

Table 5.3 illustrates that the additional grouping subsantially inaeased the p-values of D

and x'. The changes are not surpising givm the fact that the composition of the predictor

variable dinaged considenbly. c, B , and H; r d e d stable while Ci was the only goodness of

fit test to have its p-vaiue &op consîderably. This is due to the fact that th- wue only enough

predicted probablities to tom 6 groups, with some siteable deviations betareen the obsemed and

87

expected values in some of the cells. The decrease in the p-value from 6; goes against the likelihood

nho test h d i c a ~ g a stronger case for fl, not behg O.

Table 5.3 - P-Vaiues fot goodness of fit tests applied to the hypcrcholcstcrolaemi1 in SLE study with 6 covatiate patterns

The resuits show chat the deviance D and Pearson's X' might be sensitive to the number of

GOF Test

fk

D

c; H;

covariate patterns, which rnakes sense since ic is used in caicuiating D and X' . Ci also changed

dnmatically due to the changes in the grouping, as revealed in cbpter 2. Hosmer and Lemeshow's

P-value 0,3377

0.21 99

0.233 1

0,7736

& and Brown's score statltic B seemed consistent through both sets of data, and thus aras

Result Do not Reject ifo

Do not Reject Ho

Do not Reject Ho

Do not Reject Ho

Chapter 6

Discussion and Conclusions

The conwlling factors in th paper were able to demonstrate that the deviance statistic D

did not reject the nuü hypothesis of an underlying logistic mode1 as many rimes as expected. One

way to look at the la& of rejections in the deviuice statistic would be to say that it had a low type I

e w r , which wouid seem to be a good characteristic in a goodness of fit test sutistic. However, in

too many instances, D never rejected the n d hypothesis more chan 5% of the t he , signifying a

problem. Moreover, when looking at the power anaiysis, D had practicdy no powez at ail, except

for the Cauchy aitemative in data sets of sample size 500, when it reached 34'10. Looking at the

generated data, and how it reiates to D may explain the reason for the low rejection rates. In

chpter 5, it was also shown that when the data was manipuiated to decreve the nurnbu of

covariate paems, D went hom iejectuig the n d hypothesis of an underlying logisuc model to not

re jecting the n d hypothesis.

89

It was shown in [2.10] and [2.13] that for the case wheze there are J = n covviate patterns,

D and Pearson's X' respectively u e invalid as a goodness of fit test statktic. The reason for the

uselessness of these goodness of fit tests is due to the bct chat under the fid model, as n

approaches k i h t y , the aumbtx of estimated parameters also approaches infhity. However when

the b h q observations can be grouped, as n approaches infinity, the number of estimated

parameten reermias hxed for covariate pattern j, whkh is required for the ch-squued

approximation.

However, for intermediate cases where each nt, exceeds 1 or most of the m, are greater

than 1, but some are very smd, the chi-squued approximation CO the n d distribution of the D and

X* still may be inadequate. When thue are s m d values for m, , there is less replication and some

the fitted probabilities wdl be dose to O or 1. D and X' can be used as goodness of fit test

statistics. However, in computlig the significance levels, the conditional dismbution of the sutisac

given the obsemed 1, and ), should be used (McCullagh and Nelder, 1989)). Thesefore, the 0.

results reitente that it is not suffident to have each m, geater than 1, there must be enough

replication such that most or ail of m, are k g e enough. Therefore, D and X' are not usehil

when applied to the data genented in this thesis. D rmy be usehil in the situation where all the

predictor vatiables are categoricai (Simonoff, 1998) or from an e x p e h t where the levels are

controlled. In hct, when the predictor vaziables are continuous, and not pre-spe&ed CO allow

adequate replication, only X' c m be useM by computing the conditional mean and variance of

X' to calculate p-values, similar to what aras done by Hosrner et al. (1997).

The best ovedl pdormance was Brown's score statistic Ê . It seemed to puform weil with

o v d steady rejection rates, with the type 1 enor rareiy reaching as high as 6% in uiy situation,

90

induhg the s p e d cases. Furthemore, it had the highest power in attempting to detect the

Caudiy and utneme value models, especiaiiy in data sets with smaiier sample sizes. This is very

impressive considering the low type 1 eaor rates across a.D scenarios. Brown (1982) showed similv

results under similv sintnaons. Nevertheless, it is not known how would perform in other

situations, such as power with omission of a variables fiom models or power to detect other

alternative link functions.

As indicated in section 2.3.1, b is a test of the adequacy of fit of the logistic model relacive

to the specihed general parameuic f d y of models in [2.18], and aot al1 0 t h possible models.

However, it did have high powet against the Cauchy model, wbich is not part of the f d y of

modek in [2.18]. Brown (1 982) also cLimed that the f d y O t models was suffiaently wide eaough

to covu many other symmetric and aspmmetric models. Ako, B cm be used in bodi binary and

binomial logistic regression. The results h m this thesis indiclte that Ê would be the best

goodness of fit statktic under the anangement created. It seemed to be particukrly superior to the

other tests when the sample size of the data sets was s d .

Hosmer and Lemeshow's perfonned adrnirably. It was relatively consistent in its

rejection ntes, ody having a type 1 error above 6% in three c d s in table 4.1. Although the rejection

ntes wae somewhat affected by the sample size, this was mostly mie when the sample was only size

20. In l o o h g at 6gure 4.4, if the plots stated at size 50, the rejectioa rates are u s d y near the 5%

level. The power of Ci in mttemp~g to detea the Cauchy and extreme value models u s d y ody

raaked 4& amongst the 6 goodness of fit tests. It has also been shown to be unreliable depending

on the groups of risk (Hosrnu et al., 1997). Overall, this test was designed for the binary logistic

regression setting, however, it did fWfy weil in this binomiai setting, and thus would be usefid in

91

helping to detemwle mode1 adequacy. These stlternents are applicable to ci as calculated by SAS

software.

In rnmy situations, Hosmer and Lemeshow's fi: did not reject the nuli hypothesis as rnuch

as expected. The reason might be that the choice of the pre-specified cutoff points into 10 groups

may not have been appropriate in ail situations, especially for data sets of sampie size 20.

Sucpriskigly, agalist a normal alternative link huiction, it demonstrated much higher power than any

of the other goodness of fit tests. However, the power of fi: in intternpting m detect the Cauchy

a d extreme value models usuaily only ranked 5th amongst the 6 goodness of fit tests, only

perfomiing betta than D . This test would not be very useful unless the cutoff points are speafied

in a certain way, which may involve looklig cuehilly at the data. Overail, it is not one of the

preferxed goodness of fit tests undu the conditions in this thesis.

le Cessie and van Houwehgen had results sornewhere between èi and X' . It was

a

veq sensitive to the saxnple size of the data set. At times, for data sets of sarnple size 20 and 50, T,

did not reject the n d hypothesis enough. Conversely, in data sets of size 500, it had a relatively

Luge type 1 enor, at h e s higher than 10%. cc cm be used in both binary and binomial logisac

rS

regression, as it was d&ed under the binvy case. The power of Tc in attempting to detect the

Cauchy and extreme value models usually ranked 3d or 4h amongst the 6 goodness of fit tests. cc would be a usehl goodness of fit test, e s p e d y since it is gives a diffuent perspective, as it was

based on nonparametric kernel methods.

(nrerail, when assessiag the goodness of P of a logistic modei in a situation where the

predictor variable is coniinuous, it would be useM to use Brown's score statktic B , Hosmer and

Lemesbow7s ei, and le Ce& and van Houw&gen9s cc.

92

One of the areas where M e r research would shed more iight on the goodness of fit tests

is to investigate multiple logistic regression with mixed predictor variables. The results from

Hosmet et al. (1997) did not diffa when cornparhg their univariate and rnultivariate results. The

results from Brown's (1982) score and le Cessie and van Houweiingen's (1991) in thek original

papers did not have different results in the muitivariate se*. The deviuice and Pearson statistics

would requke even iarger sample sizes to easlue suffident replicaaon, that is fewer covariate

patterns, Mthki the predictor v k b l e . In the situation of muitiple logistic regression on conànuous

predictor variables, they wouid not be usefd as a goodness of fit test

Another interesthg approach would be to generate the data with few c o v h t e patterns, so

that the Pearson and deviance chi-squared tests can be assessed with the other goodness of fit tests.

If in fact die othu tests p e b at or neu the same level as the Pearson and deviance tests, then

there would be no use for them.

It was the aspiration to detemine the diffesent characteristics, snengths and weaknesses of

the goodness of fit statistics in this thesis. The diffidty with such a task is that practicdy any

goodness of fit test cm be shown to perform poorly under specinc Cu~~~~ls tances . Of corne, the

results from this thesis will not be applicable to logisac regression Livolving completely different

scenarios.

Appendix A

Additional Tables

Analysis of Variance Tables for Summarized Data

Table A1 - Anaîyais of variance pv~lucs and F-values Mth a coatinuous outcozne variable (the numbei of rcjectione i the number of tcpücations), 4 main eEicts, and 6 6nt odcr interactions for each of the abc goodneoa of fit tests. The main effiicts examined werc the size of the *neratcd data set, the diatribution and vs"ance of the predictor va"nbk x, and <h pmdcnce of ou&ome.

n

-(YI

vrrr (x)

Lh%t(x) L

h W vm(x) x n

Dizc(x) x n 1

î 4 r ~ ( y l xVaqx)

h@) xD#t(x)

var&) xDist(x) - -

DF

4

2

1

1

8

4

4

2

2

1

GOODNESS OF FIT TEST

fk B P-value

D

<0,001 (1 07.2) O. 157 (1.9) 0.362 (1.3) 0.41 1 (0.7) C 0.001 (6.0) c 0.001 (9.2) 0.189 (1 -6)

0.980 (0.02) 0.9û7 (0.1) 0.154 (2.1)

q

0.01 8 (3.1)

<0.001 (1 8.8) 0.007 (7.6)

< 0.001 (13.6) 0.462 (1 .O) 0.9 17 (02) 0.023 (3.0) 0.125 (2.1) 0.012 (4.7) 0.503 (0.5)

c0.001 (1 1.6) <0.001 (1 2.0) 0.01 5 (6.2) 0.663 (0.2) 0.006 (2.9) 0.454 (0.9) 0.017 (3.2) 0.672 (0.4) 0.616 (0.5) O.!lO4 (0.0 1)

8; x2

~0.001 (64.1) 0,0184 (4.2) 0.006 (8.0) 0.004 (8.8) 0.512 (0.9) 0.254 (1.4) 0.00 1 (5.2) 0.363 (1 .O) 0.057 (3.0) 0.20 1 (1 .7)

(F-value) < 0.001 (39.6) 0.165 (1.8) 0.112 (2.6) 0.253 (1.3)

< 0.001 (4.0) 0.279 (1.3) 0.199 (1.5) 0.856 (0.2) O. 144 (2.0) 0.354 (0.9)

<0.001 (1 G. 8)

< 0.001 (26.0) 0.100 (2.8)

< 0.001 (1 5.5) 0.640 (0.8) 0.763 (0.5) 0.188 (1.6) 0.355 (1 .l) 0.934 (0.07) 0.1 10 (26)

Percent Rejected in Special Cases

Table A2 - Pctctntages of rejectiag the DUN bypotbesis of an adequate logistic model fit. Foi each of the r k goodners of fit statistics, the rpecid case of the predictor vatiable gcnerated with a higher variance of 5.

1 1 Test Statistic 1

Table A 3 - Petcentagc~ of reiecting the nuU hypothesis of an adequate 10@8tic model fit. For each of the six goodness of fit statirticr, the spccid case of the psedictor variable gcnemted with a higher variance of25

1 1 Test Statistic 1

Size of Genemted

Table A 4 - Percentagcs oftejccting the nuii hypothesis of an adcquate bgiatic mode1 fit. For each of the six goodnem of fit statiatics for case 5 with the pcedictot variable gcnerated with a variance of t

Data

100 (1~500) 200 @=SM) Sûû (n=Sûû)

c c

Percentage of Rejecting HO

Size of Genemted Data

20 (1~475) 50 (n=5ûû)

100 (n=Sûû) 200 (1~500) 500 (n=500)

2.0 3 .O 4.6

0.4 2.6 50

Test Statistic

H; 6; B

Petcentage of Rejecting Ho

x2 D

5.0 3.4 4.8

x2 fk

1.9 7.2 8.0 9.2 13.6

0.0 0.0 0.0

0.0 1.6 1.4 3.0 2.4

B

3.2 5.0 4.0 3.8 5.4

1.2 3.8 8.0

2; D

0.8 3.4 2.4 5 .O 3 .O

1.3 3.6 2.6 4.8 8.2

1.4 4.6 3.4

H:

5.5 5.0 4.6 4.6 6.0

Table AS - Percentagc~ of tejecting the nuU hypothesis of an adequate logistic mode1 fit. For each of the six goodness of fit staEstics for case 6 with the pmdictor variable genetated with a variance of 1.

Size of Genetated

Tabk A 6 - Petcentages of tejecting the null hypothesis of an adequatc bgistic mode1 fit. For each of the six goodncis of fit statistics for case 7 with the pmdictot vviabk generated

Data

20 (11~472) 50 @=SM)

100 (n=Sûû) 200 ( ~ 5 0 0 ) 500 @=SM)

with a vatiancc of 2.

Test Statistic

Percentage of Rejecting HO

fk

Size of Genetated

Table A 7 - Pcrcentngcr of mjecting the nuil hypothesis of an adequate logistic mode1 fit. Foc each of the six goodncr of fit statirtics for case 8 with the pmdictot variable genemted with a variance of 2.

il D

0.8 3.6 4.0 3.6 3.8

Data

20 (n=454) 50 (n=SM)

4.0 5.0 5 ,O 4.2 5 .O

0.2 0.6 1.8 1.2 1.2

Test Statistic 1

e;

Percentage of Rejecting Ho

Size of Genemted

3.2 4.8 5.0 4.6 6.4

k

H;

D Ê

1.5 3.0

Test Statistic

X'

0.8 3.2 3 ,4 4.2 4.2

2.6 4.4

2.3 7.8 10.6 13.6 12.8

Ci

0.7 0.G

X'

0.4 3.6

3.3 5.2

%

H;

3.3 6.6

?k

X2

D Ê Ci

Andysis of Variance for Percent Agreement

Tabk A8 - F-values for testhg the nuU hypothesis of each factor ptedicting the percent agreement between pairs of goodncas of fit tests. The factors uamined wem the sizc of the genented data set, diraibution and variance of the ptedictor variable x, picvalence of eveno (y = l), and ail finit ordcr interactions.

Factor

n Prev(y) Var(x) Dis t(x) Prev(y) x n

Varlx) x n

Table A.% - continued

df

4

Dist(x) x n

Prevb) xVar(x) Prev(y) xDist(x) Var@) x Dis t (x)

- 3

1 1 8 4

F-value for Percent Agreement between Pairs of Tesa,

3 - 3

- 3

1

Factor

n Prev(y) Var(x) Dist(xj

f , a n d ~ 143.1 4.33 29.0 1.9 5.0 11.4 1.7 O. 8 0.4 0.02

PrevOf) x n 8 2.7 3.9 1.4 1.9 3.3 Var(x) x n 4 0.1 1.5 1.5 4.2 1.6

Prev(y) xDist(x) - 9 20.9 0. 1 0.9 1.7 28.2

Vat(x) xDist(x) 1 1.4 1 -7 3.3 4.5 2.9

df

4 - 3

1 1

f k a n d D 142.4 31.7 10.5 12.4 13.8 0.4 12.4 1.6 19.1 7.7

30.1 17.0 16.2 3.4

35.9

J

F-value for Percent Agreement between Pairs of Tests

& u i d ~ ' 123.5

f , a n d C ; 97.4

0.8 0.6

2.1 O. 1

f k a n d ~ ; 203.3

O. 1 0.1 28.2 11.5

3 - d ~ 39.6

31.4 5.5 56.1 0.9

31 .O 1 15.5 2.9 1.7 0.6 11.8

b a n d e ;

132.5 1 9.4 53.0 3.0 79.7 1 8.4

2.8 325 2.1 4.9 11.0

an de;' bandf i ; 80.5

b m d ~ ' 10.0

01.7 8.3

31.7

114.3 45.1 3.0 2.4

11.9 0.1 1 -9

Table A8 - continucd

Tabk A9 - Obsewed and expccted pemnt agreement be-n al1 6 goodness of fit tests

. . . . . . . . - . - . .

Factor

n Prww Var (x) Dis t (x)

. - - - --

Prev(y) x n Var(x) x n Dist(x) x n

Prev(y) xVar(x) Prw(y) xDist(x) Var(x) x Dis t (x)

1 Size 1 Petcentage of AU 6 Statistics in 1

. . .

df

4 - 9 1 1 8 4

F-vahe for Percent Agreement between Pairs of Tests

4

- 7

- 7

Table 1110 - F-vdues for tesang the nuU hypotheris of each factor prrdicting the percent agreement between al1 6 goodnese of fit tests. The factors cxamined were the size of the genemted data set, distribuaon and vvimce of the prcdictot variable X, ptevalence of events (y = l), and 111 first order interactions.

H; = d x 2 233.5 121 30.9 56.5 2.5 4.9

5

500 Ove rall

Factor df F-vdue for Aii 6 Goodness of Fit Tests n 4 244.8

Var(x) 1 1.9

~~d H; 95.3 355.6 37.4

. 17.3 10.3 10.3 0.6 11.7 27.8

80.3 (70.3) 85.5 ('77.8)

C;=d&; 82.5 5.0 10.8 28.9

. . -. .

5.9 1.4

' D W I X ~ 150.8 9.5 5.8 2.6 4.8 6.2

1 1 41.1

t; - d x 2 164.0 23.7 3.1 38.9

- . 0.8

3.4 15.1 1.2 9.4

Dis t(x)

Prevh) x n Var(x) x n Dist(x) x n

L

Preve) xVar(x) Prev(y) xDis t(x)

Var(x) xDis t(x)

2.3

2.9 1.9 1.6

1 8

4 4

2 9

1

0.2 1 2.1

6.9 3.5 2.9 10.7 1.8 5.5 6.2

16.6

17-0

1.3 7.9

18.5 ,

3.6 6.7

Tabk Al1 - O v e d obscnnd and expected (in p~rrntheaes) petcent agreement between the icjections of d pairs of the 6 podnesr of fit tests for cases 25-27

Tabk A13 - O v e d obbrerved and arpected (in parentheses) percent agreement bctween the tejcctionr of di pain, of the 6 goodness of 6t arts for case rJ

Statistic

3 D

k; k; X'

Tabk Li2 - Overd obeerved and expccted (in parenthcaca) percent agreement between the rejcctionr of dl p h of tbe 6 goodness of 6t tests for case 2û

GOF Test

B D

c;

fk

35.5 (94.8)

96.5 (96.2)

f'k

35.9 (90.9)

94.9 (94.9)

35.6 (90.5)

8; GOF Test

Ê

94.4 (93.6)

95.6 (94.6) 93.0 (92.0)

97.1 (97.0) 1 95.5 (94.4)

93.4 (93.0) [ 93.6 (90.4)

fi; X'

D

B

95.8 (95.8)

94.6 (9 1.4)

91.9 (91.5)

96.2 (88.7)

S;

94.3 (33.4)

96.4 (95.8)

93.3 (91.8)

Ci D

95.4 (95.4)

92.8 (92.4)

94.4 (89.5)

D 3 ,

c;

H;

92.1 (92.0)

94.2 (89.2)

96*4 (96.4)

93.4 (93.4)

Ê

H;

4

94.5 (94.2)

91.2 (90.2)

90.4 (90.1)

33.6 (92.6)

Appendix B

SAS Programs SAS Macro to Generate Data Sets /*************************************************************************** / /* S A S M A C R O G E N F I L E S */ /*************************************************************************** /

* Used to generated data sets for cases 1 -24 having a predictor variable comiag */ /* a normal or uniforrn dtsuibution */ * X- N(mu,sigmasq) Y cornes Gom Logistic with +ha = pu and beta = P i */ /* r = nurnber of repkates s = seed nurnber n = size of data set */ /* mu = mean and sigmasq = variance when X - N o d */ /*************************************************************************** / %macm GENFILES(r,n,s,mu,alph,beta,sigrnasq);

options nonotes;

%do k= 1 %to &r; data siml ; do i=l to &n;

seed=&k+&s;

/*** when xl -N(mu,sigrnasq); ***/ xi =&mu+sqrt(&sigmasq) *aonnai(seed);

/*** when x1 -U(8.27,ll.73); s mean=10, variance = 1 ***/ xl=3.4G*nini+8.27;

/*** when XI -U(7.55,12.45); => mem=lO, vleznce = 2 ***/ XI =4.90*runi+7.55;

/*** Round off xl to get repiication to aeate fewer covaziate patterns than obsemations ***/ x=round(xl,.l);

p-x=exp(&alpha+&betlf(x-l0))/(1 +exp(&aipha+&beta*(x-10))); nini=ranuni(seed); if p-x>runi then y= 1 ; else y=O; output;

end; -;

/*** Extract fiom data set siml to get data set of sample size 20 ***/ data case2.snmn 1 &k; set siml; if -n- ge 1 and -n- le 20 then output; m;

/*** Extract Gom data set siml to get data set of sample size 50 ***/ data case2.samn28rk; set siml; if -n- ge 21 and -n- le 70 then output; m;

/*** Exac t fiom dan set simt to get data set of sarnple size 100 ***/ data case2.samri3&k; set sirnl; if -n- ge 71 and -n- le 170 then output; m;

/*** Exac t fiom data set sirnl to get data set of sample size 200 ***/ data caseZ,sarnn5&k; set siml; if -n- ge 321 and -n- le 520 then output; m;

/*** Exac t from data set siml to get data set of sample size 500 ***/ data case2.samn6&k, set siml; if -n- ge 521 and -n- le 1020 then output; -;

%end; %rnend GENFILES;

/***************************************************************** / /* E N D O F G E N F I L E S M A C R O */ /***************************************************************** /

SAS Macro to Generate Data Sets for Power Calcuiations /*************************************************************************** / /* S A S M A C R O P O W N O R */ /*************************************************************************** /

/* Used to generated data sets for power calculauon when the alternative Ikik */ * a nonnaî, Cauchy, or extmne value with the predictor v h b l e */ /* X- U(-3,3) Mth aipha = and beta = Pi */ /* r = number of replicates s = seed number n = site of data set */ /*********************************************w**************************** / %mauo POWNOR(r,n,s,aipha,beta); options nodate nonumber nonotes;

S'odo k=l %to &r; data siml ; do i=l to &n;

seed=&k+&s; nini=ranuni(seed); XI=-3+6*&;

/*** Round off xl to get replication to aeate fewer covari?te patterns tlim obsenrations ***/ x=round(xl,. 1);

/*** For a probit alternative link hct ion ***/ p-x=probnorm(&aipha+&beta"x);

/*** For an exneme value altemative link huicaon ***/ p-x= 1 -exp(-exp(&alpha+&beta*x-0.3GGS));

/*** For a Cauchy alternative link function ***/ p-x=0.5+atui(2*(&aipha+&beta*x)) 13.1 41 5927; d=ranuni(seed); if p_x>runi2 thea y=l; else y=O; outpur,

end; nul;

/*** Exttact fkom data set siml to get data set of sample size 20 ***/ data powaor2.samnl &k; set siml;

/*** To hx SAS version 6.12 bug which wiIl not produce deviance ***/ /*** and Pearson goodaess of fit tests, x = x+O.O h e s the bug ***/

x=x+O.O;

if -n- ge 1 and JI- le 20 then output; -;

/*** E x a c t from data set siml to get data set of sample sue 50 ***/ data pownor2.samn2&k; set sirnl; x=x+O.O; if -n_ ge 21 and -n- le 70 then output; m;

/*** Extract from data set siml to get data set of smple size 100 ***/ data pownor2.samn3&k; set siml; x=x+o.o; if-n- ge 71 and -n- le 170 then output; nin ;

/*** Extract from data set sirnl to get data set of sample size 200 ***/ data pownor2.samn5Brk; set &ml; x=x+o*o; if -n- ge 171 and -n- le 370 then output; nul;

/*** Extract from data set siml to get data set of sample size 500 ***/ data pownorZ.samn6&k; set sirnl ; x=x+o.o; if -n- ge 371 and -n- le 870 then output; m;

%end; Ohmend POWNOR;

/***************************************************************** / /* E N D OF P O W N O R M A C R O */ /***************************************************************** /

SAS Macm for le Cessie and van Houwelingen's f',

Structure: %FI"iTEST(BETAS=SASdataset, PRED=ShSdauset, DEPZvariable,

BANDWID=value, VAR=variables)

Input parameters: BETAS : The output dataset of PROC LOGIST, containkg the 6na.i

parameter estimates and th& covariance matrix. PRED : The output dataset of PROC LOGIST, coataining ail

viuiables and obsemations of the input dataset dong with the predicted probability of the outcome vaxiable being 1.

DEP : The dependent variable in the logistic d y s i s . BANDWID: The standvdued bandwidth. If no bandwidth is specified, the

defadt vdue is chosen such that each smoothing region contains about sqn(n) points.

VAR : The smoothing variables. These are the variables accordhg to which the residuals in the test statistic are smoothed. We recommend to use only the vaziables correspondhg to the hear tenns in the model. The smoothing variables need to be c o n ~ u o u s variables !!!!!!

PROC SORT DATA=&PRED; BY &VAR; * SORT OBSERVATIONS BY SMOOTHING

VARIABLES;

PROC IML ;

" GET SMOOTHING VARIABLES FROM SAS-DATASET PRED; USE &PRED; READ W VAR (&VAR) INTO Zi; * SMOOTHING VAWBLES

INTO MriTRJX S;

* FIND DIMENSIONS;

D = NCOL(X); * NUMBER OF SMOOTHING VAUABLES; N = NROWO; * NUMBER OF OBSERVATIONS;

* GET COVARIANCE MATRIX FROM SAS-DATASET BETAS; USE &BETAS; RMD ALL INTO S(I COLNAME=VilRNAhES 1); S=S(I2:NROW(S),l:(NCOL(S)-1)I); *COVARMNCEMATRE; VARNAMES = VARNAMES(I1 ,Z:(NCOL(ViUU\JAMES) -1) 1 );

* THE NAMES OF THE COVARIATES USED IN THE LOGISTIC MODEL ARE R E A D FROM DATASET BETAS INTO VARIABLE VilRNAhEs;

* GET COVAUTES AND PREDICTED PROBABILITY FROM SAS-DATASET PRED; USE &PRED; READ A U VAR VARNAMES INTO 2; * COVAUTES IN MODEL; READ ALL VAR {-P-) INTO P; * PREDICTED PROBABILITY; READ ALL VAR {&DEP) INTO Y ; * DEPENDENT VARLABLE;

= J(W,l) 1 1 2; * JOIN Z WlTH VECTOR OF ONES TO TREAT INTERCEPT;

* DEFINE BANDWIDTH AND BOUNDARY OF SMOOTHING REGION; BANDWID = &BANDWUID; IF &BANDWID = -1 THEN BANDWID= (4/(N##(l/(2#D))));

* IF NO BANDWDTH IS SPECIFIED THE BANDWIDTH IS SET TO THE DEFAULT VALUE;

BOUND = 0.5 # BANDWID;

* COMPUTE RESIDUALS; RES = (Y-P) /SQRT(P#(l -P)); FREEY; * D1SCAR.D Y;

* STANDARDIZE X; MEAN = IL([+, I)/N; VAR = ((X#X)(I +, 1)-MMN#MEAN#N)/(N-1); MEAN = REPEAT(hiEriN,N,l); VAR = REPEAT(VAR,N,I); X = @-MEAN)/ (SQRT(ViU2)); FREE MEAN VAR; * DISC4R.D blEriN VAR;

* AGGREGATE DATA BY SMOOTHING VARIABLES; ST*MtT AGGREGT;

Z = Z#SQRT(P#(l-P)); * NECESSARY IN COMPUTING CORRECTION FOR MEAN T;

FREE P; * DISURD P;

TOTAL = J(NJ,O); * INITIALIZE VECTOR TOTAL; M=O; J=l; X = X//J(l,D,-lOooo0); * JOIN X TEMPORARY WITH AN EXTRA

ROW-, DO I=2 TO N+1;

IF ANY(X( 1 1, 1 ) A= X( 1 1-1, 1 )) THEN DO; M = M+1; W M , 1) =z(lJ:I-L I ) W 1); RES(I M, 1) = RES(I J:I-l, 1)(1+, 1); V I W 1) =x(lJy 1); TOT;111,(I M, 1) = 1-J; * SUM COVARLATES AND RESIDUALS OF

OBSERVATIONS WlTH THE S A M E SMOOTHING VARIABLES;

J = 1; END;

END; * M NOW CONTAINS THE NUMBER OF AGGREGATED GROUPS, TOTAL CONTAINS THE NUMBER OF OBSERVATIONS IN EACH GROUP;

X = X(I 1:M, 1); z = z(l LM, 1); RES = RES(Il:M, 1); TOTAL = TOTAL( 1 1 :M, 1 ); FINISH; RUN AGGREGAT;

* START LOOP IN WHICH T, ITS MMN AND VARIANCE ARE W U L A T E D ; START LOOP; T = 0; * INITIALIZE; VART = 0; ETCOR = 0;

DO I=1 TO M; = X(p, 1); * VALUE OF SMOOTHING VARFABLES

IN ITH GROUP; TOThLI = TOTAL(] 1,I); * NUMBER OF OBSERVATiONS IN ITH

GROUP; XI = REPEAT(m,M,l);

* COMPUm WEIGHTS FOR 1-TH OBSERVATION; W = ((ABSW-XI))(I ,O 1) <= BOUND);

* WIJ = 1, IF IXIK-XJKI <= BOUND FOR ALL K, 1 <=K<=D;

* TEST STATISTIC; SUMW = SUM(WWT0TAI.J; SUMRES = SUM(W#RES);

* ASYMPTOTIC VARIANCE; VART = VART + SUMWWTOTLUI;

* CORRECTION W; SUMZ=(w#Z)(I+, 1 ) ; SUMZ = SUMZ # (I/SQRT(SUMW)); ETCOR = ETCOR + (SUMZ * S * SUMZ')#TOTU;

END; FINISH; RUN LOOP;

* COMPUTE RESULTS; T = T/N; M U N = 1 - ETCOR/N; VART = 2#(2/3)##D # VART/(N#N);

* PROBABILITY SCALED CHI-SQUARED DISTRIBUTION; V = 2 # W # m N / VART; C = VART/ (2#MEAN); P = 1 - PROBCHI(F/C),V);

* PRODUCE OUTPUT; RESET NONAME; R = (' (STANDARDIZED) BANDWIDTH USED IS ') ; c = {' '1; R2 = (' TVALUE'}; C2 = (' ') ; R3 = (' PVALTH'}; C3 = (' '} ;

PRINT BANDWID(1 ROPCrNAME=R COLNAME = C FORMAT= F6.3 1); PRMT T( 1 ROWNAME=R2 COLNAME = C2 FORMAT= FI 5.4 1 ); PRINT P( 1 ROWNAME=R3 COLNAME = C3 FORMAT= F6.4 1 );

QUIT; * END IML;

*- END MhCRO FITIEST- *. 9

SAS Macro for Other Goodness of Fit Test Statistics

/*************************************************************************** / /* S A S M A C R O G O F T E S T S */ /*************************************************************************** /

/* Pedonns logistic regression on generated data and caicuiates or exacts */ * goodness of fit tests */ /************************************************************************** / %maso GOFTESTS(r,n,prop,varianc,distn); options nodate nonumber nonotes;

......................................................... / /* Route output to an Extemal File Used in Next Step */ /******************************************************** /

/* Logistic Procedw */ /******************************************** /

proc logistic data=pownor2.sunnl&k descending COVOUT OUTEST=BETAS; mode1 y=x / ctable pprob=0.5 lackfit aquve scde=none aggregate; output o u ~ o u t û reschi=reschi difdev=difdev resdevzresdev pred=-PL nui;

data outh0; set out0; qhat=l--PL if CP- ge 0.0 and -P- c 0.1) then ded='DOl '; if (-P- ge 0.1 and -P- < 0.2) then deciktD02'; if (-P, ge 0.2 and -P- < 0.3) then decWD03'; if LP- ge 0.3 and -P- < 0.4) then decil='D04'; if CP- ge 0.4 and -P- < 0.5) then deciktD05'; if LP- ge 0.5 and -P- < 0.6) then decik'DO6'; if LP- ge 0.G and -P- < 0.7) then ded='DOït; if (-P- ge 0.7 and -P- < 0.8) then decWD08'; if LP- ge 0.8 and -P- < 0.9) then decWD09'; if CP- ge 0.9 and -P- le 1 .O) then ded='D 10';

proc son data=outhO; by ded; m;

pnx: sufnmuy daca=outho; by decil; v u y -P- qhat; output ou~outhh surn=ysm -P-sm qhatsm n; M;

data outhh; set outhh; obs 1 =ysm; e-xp 1 =-P-sm; obsO=-FREQ--obs 1 ; expO=qhatsm; hl ~((obsl-expl)**2)/expl; hO= ((obsO-ucpO)**2) /expO; citop -TYPEE -FREQ- ysm -P-sm qhatsm; nui;

proc means datsouthh surn noprint; var hl ho ; output our=outhak sum=hl sm h0sm; mn;

data outhak; set outhak; Hhat-h 1 sm+ h0sm; -;

/* Brown's Score */ /*************************************** /

proc sort data=ou#; by -PJ -;

data temp.outbrn.2; set outû; if -Pb = O then -P-=-P-+0.000001; if-P- = 1 then -Pz-PP-O.OOûûûl; qhatzl--PL s i= @--PJ*(l +IogCPJ/qhat); s2=-(y--PJ*(l +log(qhat) /-PJ;

sigssl 1 =-P-*qhat*(i +log(-PJ/qhat)**2; si@ 1 =--P-*qhat* (1 + 1ogLP-J /qha t) * (1 +log(¶ ha t) /-PA ; ~ig~~l2=--P-*qhaP(l +log(-PJ/qhat)*(l +log(qhat) /-PJ; sigss22=-P-*qhar*(l +log(qhat)/-PJ**2; sigstll =-P-*qhat*(l +log(-PJ/qhat); sigsûl =--P-*qhat*(l +log(qhat) /-PJ; sigstl2=x*-P-*qhat"(1 +log(-PJ /qhat); sigst22=-x*-P-*qhar*(1 +log(qhat)/-PJ; sigts I l =-P-*qhat*(l +log(-PJ /qhat); sigts21 =x*-P-*qhat*(l +log(-PJ /qhat); - sigts l2=--P-*qhat*(l +log(qhat) /-PJ; sigts22~-x*-P-*qhat*(l +log(qhat) /-PJ; sigrtl 1 =-P-*qhat; sigtt21 =x+P_*q hat; sigttl 2=xi-P-*qhat; sigtt22=(x**2) *-P-*qhat; m;

proc mems dau~temp.outbm2 n sum mean noprint; var sl s2 sigssl 1 s&dl sigssl3 sigss22 sigstl 1 sigst.1 sigstl2 sigsr22

sigtsl 1 sigts21 sigtsl2 sigtsî2 signl 1 sq@l sigttl2 sigtt22 ; output out=temp.outbak surn=msl ms2 msigssl 1 msigss21 msigssl3 msigss22 msigstl 1

sigst2l msigstl? msigst22 msigtsl 1 msigts21 msigts 12 msigts22 msignll msigtt21 msigttI2 msigt22 ;

run;

data pvalbm; set ternp.outbak; bom=mstgnll *ms1gtt22-rnsigttl2*msign21;

inval î = ms@/ bom; Livtt21 =-msignl2/bottt; invttl2=-msigtt21 /bottt; invn22= msigttl 1 /bottt;

subtotl 1 zstittl 1 *msigtsl 1 +sattl2*msigts21; subtot21 =stin21 *msigtsl 1 +stitt22*msigts21; subtotl2=stittll *msigtsl2+stinlTmsigts22; subtot22=satt2l*rnsigts12+satt22*msigt~22;

keep brownscr size prop distn vaauic obs; m;

/*************************************************** / /* Append Files fiorn Each Replication */ /*************************************************** /

proc append baseznewbrn data=pvalbm; w;

proc append base=outhl data=outhak; m;

proc primo; mn ;

/* Extract Records off the Output File for Staastics that u e */ /* pan of the Output fiom the Logistic Procedure */ /**********************************************************/

/*********************************************/ /* Pearson Residuals */ /********************************************* / data poalpear ; kfile dout ; input dummy $ @ ; if dummy= 'Pearson ' then

do; input df pearschi peartest pvaipeu; if pearschi ne . then output;

end;

data pvaipeu; set pvalpear; if pvdpear > 0.05 then do;

rejpeax=O; end; eise do;

rejpearz 1 ; end; size=&n; prop=%prop; distn=&dism; vafianc=BNananc; keep pvaipear pearschi rejpear size prop disai; m;

proc append base=pownor?.pvalpeu data=pvaipear; -;

/********************************************** / /* Deviance Residuais * / /********************************************** / data pvaldevl ; infile dou t ; input dummy 3 @ ; if dumrnyz 'Deviuice' then do;

input df Deviance devtest pvaidev; if Deviance ne . then output;

end;

data pvaldevl ; set pvddevl ; if pvaldev > 0.05 then do;

rejdevZ0; end; else do;

re idev= 1 ; end; size=&n;

prop=&prop; dism=&dis tn; vai?anc=&varianc; mn;

proc append base=pownor2.pvddevl data= pvaldevl ; m;

/***************************************************/ /* le Cessie and van Houwelùigen's T-Hat */ /*******************4**4****************************/

data T2 ; inhle aiiout ; input XI $ @ ; if XI = 'TVALUE' then do;

input TVALUE X2 PVALTH; Output;

end; &op a1 %; m;

data lecess; set T2; if PVALTH > 0.05 then REJTHAT=O; if PVALTH <= 0.05 then REJTHAT=l; if PVALTH = . then REJTHAT=.; size=&a; prop=&prop; dism=&àistn; varianc=BNatianc; obs=-n; M;

proc append base=pownor2.lecesst &m=lecess; -;

/* Hosmer & Lemeshow's C-hat */ .................................................... / data formcl ; infile ailout; input dummy $ @ ; if dumrny= 'Goodness' then do;

input dl d2 C-hat d3 df prob ;

output; end; -;

/* Calculate the P-values for Vatious Statistics */ /* and Tdy the nurnbu of Rejections */ /***************************************************/

/* Calculate P-value for Hosmer&Lerneshow's C-hat */ /******************i***************~c***~*************** / data pvaichat; set fonncl; pvalc=l -probchi(C-habdf); if pvalc > 0.05 then do;

rejectc=O; end; else do;

rejectc= 1 ; end; h o p dl ci2 63 prob dumrny; size=&n; prop=&prop; distn=&dism; valianc=&vazianc; m;

proc append base=pownor2.pvalcilt dara=pvdchat; =;

/* Calculate P-value for Brown's Score */ /*********************************************************/ data pvaibm; set newbm; if brownscr ne . then do;

if brownscr < O then do;

pvaibm=l ; end; if brownscr ge O then do;

pvdbm=l -probchi(brownsa,2); end;

if pvalbrn > 0.05 then do;

rej brown=O; end; else do;

rejbrownz 1 ; end;

end; keep pvaibm brownscr rejbrown;

proc append base=pownor2.pvaibm data=pvaibm; m;

/ * Calculate P-value for HosrnedLemes how's H-hat * / /*****************************************************/ data pvalhhat; set OUM; if H-hat ne . then do;

pvaihhatz l -probchi(HOIhat,8); if pvaihhat > 0.05 then do;

rejhhat=O; end; else do;

rejhhatzl ; end;

end; size=&n; prop=&prop; dis tn=&dism; vaxianc=&varianc; keep pvalhhat H-hat rejhhat size prop dism; -;

proc append base=pownor2.pvalhhat &ta=pvahhat; -;

proc p ~ t t o ; Na;

%mend GOFTESTS;

proc pmtto; -;

/********************************************************************** / /* E N D O F M A C R O G O F T E S T S */ /********************************************************************** /

/* Multiple logistic regression perfonned on the results of the */ /* univariate logistic regression performed on the simdated data */ /**************************************************************** / proc genmod data=master.sd2; dass prop varianc dism size; model rejhhat= size prop vuiuic dism sizefprop size*viuianc size*dism

propCvleanc prop*distn vuianc*dism / dist=bin iink=logit type3; ma;

/* Multiple logistic regression pedormed on the results of the */ /* univariate logistic regression perfonned on the sirnuiaied data */ /**************************************************************** / proc genmod dac~rnaster.sd2; dass prop vlriuic dism size; model rejhhatz size p o p varianc dism size*prop size*varianc size*dism

prop*vuianc prop*dism varianc*dism / dist=bin link=logit cype3; nui;

/* Add frequencies of rejections for each goodness of fit tests for */ /* each sample size catego y for each case */ /***************************************************************** / data s20 s50 slûû s200 s5ûû; set master.sd2; if size=20 then output s20; if size=50 then output s50; if sue= 1 00 then output s 1 00; if size=200 then output s200; if &e=500 then output s500; -;

data s20; set s20; by case; retain c d p h b t count;

if ht.case then do;

coUnt=l; czrejc2; dzrejdev2; p=rejpear2; h=rejhhat; b=rej brown; t=rej that;

end; else do;

c=sum(c,rejc2); d=sum(d,rejdev2); p=sum@~e jpear2) ; h=sum(h,rejhhat); b=swn@,re j brown); t=sum(t,rejthat); count=sum(count, 1);

end; if hst.case then output; keep size case propdism alpha beta variant c d p h b t count; run;

data s50; set s50; by case; retain c d p h b t count; if &st.case then do;

count= 1 ; c=rejc2; d=re jdev2; p=re jpear2; h=rej hhat; b=rej brown; t=rej that;

end; else do;

c=sum(c,re id); d=sum(d,re jdev2); p=sum@,re jpeu2); h=sum(h,rejhhat); b=sum@,rej brown); t=sum(uejthat); count=sum(count,l);

end;

if lastcase then output; keep size case prop dism alpha beta vaknc c d p h b t count;

data s 100; set s100; by case; retain c d p h b t count; if fitst.case then do;

count=l ; czrejc2; d=re jdev2; p=reiped; hzrejhhat; b=mj brown; tzrejthat;

end; else do;

c=sum(c,.rejc2); d=sum(d,re ejdev2); p=sum@,1ejpear2); h=sum(h,rejhhat); b=sum(b,rejbtown); t=surn(tjej that); count=sum(count,l);

end; if hstxase then output; keep size case prop dism aipha beta varhc c d p h b t count; nui;

data s200; set s2OO; by case; re& c d p h b t count; if ht.case then do;

count=l; c=rejû; d=rejdev2; pzrejpear2; krejhhat; bzrejbrown; t=rejthat;

end; else do;

c=sum(c,re~c2); d=surn(d,rejdev2); p=sum@,rejpear2); h=sum(h,rejhhat); b=sum(b,rejbrown); t=sum(t,rej that) ; count=sum(count, 1);

end; if hstcase then output; keep size case prop dism alpha beta variaac c d p h b t count; nui;

data s500; set s500; by case; retaki c d p h b t count; if Fst.case then do;

count= 1 ; c=rejc2; d=rejdev?; p=reipeu2; krej hhat; b= re j brown; t= rejthat;

end; else do;

c=sum(c,reic2); d=surn(dje jdev2); p=sum@,rejpeu2); h=sum(h,rejhhat) ; b=sum@,rejbrown); t=sum(bre jrhat) ; count=sum(count,l);

end; if lastxase then oucput; keep size use prop dism alpha beta varhc c d p h b t couat; m;

data s d ; set s20 s50 slOO s200 s5ûû;

References

Azzalini, A.., Bowman, A.W., Hkàle, W. (1989). On the use of nonpanmetric regression for model checking. Biometnnk 76,l- 12.

Brown, C. C. (1982). On a goodness of fit test for the logistic model based on score staristics. Cornmrnicahonx in S~aritris Theot~ and Metbodr 1 1 (1 O) : 1 097- 1 1 OS.

Bruce, I.N., Urowitz, M.B., Gladrmn, D.D., and Haiiett, D.C. (1999). The nanual history of h yperc holes terolaemia Li SLE. In Press ]oumd of Rheumatohgv.

Collett, D. (1991). Modelling b i n q data. Chapman and Hali, London.

Copas, J.B. (1 983). Plottingp agalist x. Appked Stati~trC~ 3Z:Z-3 1.

Cox, D. Rey and Snell, E.J. (1989). The analysis of binary data, 2* edirion Chapman and Hall, London.

Fowlkes, E.B. (1987). Some diagnostics for binary logisuc regression via smoothing. Biomctnn& 74, 503-5 15.

Gordon, T., Kannel, W.B., and HalpeBn, M. (1979). Predicrion of coronary hem disease. ]ornai of Chmnzc Disc~x6.f 32:427-440.

Hosmer, D.W., and Lemeshow, S. (1980). Goodness of fit tests for the multiple logisac regression model. Communiccftionz in Stcttik?#i~ A 1 0: 1043-69.

Hosmer, D.W., and Lemeshow, S. and Klar, J. (1988). Goodness of fit t e s ~ g multiple logistic regression analysis when the estliilted probabilioes are srnail. Biometrrcai ]ornai 30% 1- 924.

Hosmer, D.W., and Lemeshow, S. (1989). Applied logistic regression, John Wiiey and Sons Inc., New Yorky NY.

Hosmer, D.W., Hosmer, T., le Cessie, S., and Luneshow, S. (1997). A cornparison of goodnessof- fit tests for the logktic regression model. StatiitrC~ in Medinne 16:9,965-80.

Landwehr, J.M., Pregiion, D., and Shoemaker, AC. (1984). Graphical methods br assessing logistic mgression models (with discussion). ]onmol Ofthe Amen'can Stdti&iEo/ A~~on'ation 79:6 1-83,

le Cessie, S., and van Houwehgen, J.C. (1991). A goodness of fit test for binary regression models, based on smoothing methods. Biumetnir2 4ï:lX%lî82.

le Cessie, S., and van Houwelingen, J.C. (1995). Testing the fit of a regression model via score tests in random effects rnodels. Biometnc~ 5 l:GOO-6l4.

Lemeshow, S. and Hosmer, D.W. (1982). A review of goodness of fit statisacs for use in the development of logis tic regression models. Ammian J o d OfEpidemofo&v 1 1 S:92-lOG.

Maddlia, G.S. (1983). Lunited-dependent and qualitative variables in economeuics. Cambridge University Press.

Magee, L. (1990). R= measures based on Wald and belihood raao joint significance tests. Ammian StatrXir'an 4:250-253.

McCullagh, P. and Nelder, JA. (1989). Genenlued Iuiear models, 2d edition. Chapman and Hall, London.

Nagelkerke, ND. (1991). A no te on a general de finition of the coefficient of determination. Bi0metn.h 78;3:691-692.

Nelder, JA., and Wedderbum, R.W.M. (1972). Generaiized lineu models. Jounioloftbe Ruyaf Stuk>tti.af Su&g S M ~ I A 135: 761-768.

Pregibon, D. (1981). Logistic regtession diagnostics. Annal . ofSutri~ri~, 9: 705-724.

Prentice, R.L. (1976). A generllization of the probit and logit methods for dose response Cumes. Biometn'cs X:76 1 -768.

SAS Institute Inc. (1990). SAS Language: Reference, Version 6, 1" edition. SAS Institute Inc., Cary, NC, USA.

SAS Institue Inc. (1993). User's guide volume 2, version G,4& ediaon. SAS Institute Inc., Cary, NC, USA.

SAS Institute Inc. (1997). Software: Changes and enhancements rhrough release 6.12. SAS Insütute Inc., Cary, NC, USA.

Simonoff, J.S. (1 998). Logis tic regression, ca tegorical predictors, and goodnesss f- fit: It depends on who p u ask. TheAmmian Stuthiiin 52 1:lO-14.

Stukel, TA. (1 988). Generahed Logisâc Models. ]oumai u f h AnmEon Stah.&kd Ar~oniztron 83:426- 43 1.

Tsiatis, AA. (1980). A note on a goodness-o f-fit test for the logistic regression model. BiometriAu 67:2SO-25 1.

GOODNESS FIT TESTS IN - University of Toronto T-Space · GOODNESS OF FIT TESTS IN LOGISTIC...

Documents

Transcript of GOODNESS FIT TESTS IN - University of Toronto T-Space · GOODNESS OF FIT TESTS IN LOGISTIC...