Chapter 16 - Logistic Regression Model
-
Upload
firdaus-ahmad -
Category
Documents
-
view
216 -
download
0
Transcript of Chapter 16 - Logistic Regression Model
-
7/28/2019 Chapter 16 - Logistic Regression Model
1/7
LOGISTIC REGRESSION MODEL
1
1.1IntroductionRegression methods have become an integral component of any data analysis concerned
with describing the relationship between a response variable and one or more explanatory
variables. It is often the case that the outcome variable is discrete, taking on two or more
possible values. Over the last decade logistic regression model has become, in many fields,
the standard method of analysis in this situation.1
Before beginning a study of logistic regression it is important to understand that the goal of
an analysis using this method is the same as that of any model-building technique used in
statistic: to find the best fitting and most parsimonious, yet biologically reasonable model to
describe the relationship between an outcome (dependent or response) variable and a set of
independent (predictor or explanatory) variables. These independent variables are often
called covariates. The most common example of modeling, and one assumed to be familiar
to the readers of this text, is the usual linear regression model where the outcome variable isassumed to be continuous.
1
What distinguishes a logistic regression model from the linear regression model is that the
outcome variable in logistic regression is binary or dichotomous. The difference between
logistic and linear regression is reflected both in the choice of a parametric model and in the
assumptions.1
In this chapter, we focus on logit analysis (a.k.a. logistic regression analysis) as an optimal
method for the regression analysis of dichotomous (binary) dependent variables. Before
considering the full model, lets examine one of its components the odds of an event.2
1.2Odds and Odds RatiosTo appreciate the logit model, its helpful to have an understanding of odds and odds ratios.
Most people regard probability as the natural way to quantify the chances that an event
will occur. We automatically think in terms of numbers ranging from 0 to 1, with a 0 meaning
that the event will certainly not occur, and a 1 meaning that the event certainly will occur.2
Probability, can be computed as follows:
For example:
-
7/28/2019 Chapter 16 - Logistic Regression Model
2/7
LOGISTIC REGRESSION MODEL
2
However, there are other way of representing the chances of event, one of which the odds
has a nearly equal claim to being natural. Consider Table 1, which shows the cross -
tabulation of race of defendant by death sentence for the 147 penalty-trial cases. The
numbers in the table are the actual numbers of cases that have the stated characteristics.
Table 1 Death Sentences by Race of Defendant for 147 Penalty Trials
Black Nonblack Total
Death 28 22 50
Life 45 52 97
Total 73 74 147
Odds, =
Odds ratio = or exp(, the odds ratio represents the change in odds of being in on
categories of outcome when the value of a predictor increases by one unit.
For example: (additional exercise April 2010)
I. Odds of a death sentence =II. Odds of a death sentence for black =
III. Odds of death sentence for nonblack =IV. Odds ratio of the black to the nonblack =
Interpretation: we may say that the odds of death sentences for black are 47% more
than for nonblack. We can also say that the odds of a death sentences for nonblack
are 1/1.47 = 0.63 times the odds of a death sentence for blacks. So, depending on
which categories were comparing, we either get an odds ratio greater than 1 or its
reciprocal, which is less than 1.
1.3The Logit ModelNow were ready to introduce the logit model, otherwise known as the logistic regression
model. For explanatory variables and individuals, the model is
[ ]
where is the probability that The expression on the left-hand side is usuallyreferred to as the logit or log-odds Logit, is the log of the odds, is not only linear in X, but alsolinear in the parameters.
-
7/28/2019 Chapter 16 - Logistic Regression Model
3/7
LOGISTIC REGRESSION MODEL
3
The positive logit values indicate that the odds are in favour of an event happening,while
Negative logit values indicate that the odds are against the occurrence of an event.We can solve the logit equation for to obtain
We can simplify further by dividing both numerator and denominator by the numerator
itself:
In mathematical expression, this formula is called the logistic function and can be written as:
ranges from - to +,ranges between 0 and 1, andis nonlinearly related to Simple logit model
Let and be defined as follows:
[ ]
Hence,
| |
-
7/28/2019 Chapter 16 - Logistic Regression Model
4/7
LOGISTIC REGRESSION MODEL
4
1.4Interpretation of odds-ratioIf for example: ,Odds ratio =
This odds ratio indicates that a smoker is 3 times more likely to develop lung cancer
compared to a nonsmoker.
1.5Applying Logistic RegressionThis statistical method were applied to a data set to compare their predictive ability of
classifying a baby as low birth weight or normal based on several predictor variables
Description of Variables
Variables Description Type Categorical
Y Birthweight Categorical 1= if low birth
0 = normal
X1 Race Categorical 2 = Malay
1 = Chinese
0 = Indian
X2 Gender Categorical 1 = Male
0 = Female
X3 Mothers age Continuous (years)
X4 Fathers income Continuous (RM)
X5 Parity Integer (children)
X6 Abortion Categorical 1 = Yes
0 = No
X7 Mothers height Continuous (cm)
X8 Vitamin Continuous (mg)
X9 Weight gain Continuous (kg)
X10 Antenatal visits Integer (number of times)
Table 2: SPSS Results for Multiple Logistic Regression.
The estimated logistic regression model obtained:
-
7/28/2019 Chapter 16 - Logistic Regression Model
5/7
LOGISTIC REGRESSION MODEL
5
where
Interpreting
The values provided in the SPSS output are equivalent to the obtained in a multipleregression analysis. These are the values that you would use in an equation to calculate the
probability of a case falling into a specific category. You should check whether your valuesare positive or negative. This will tell you about the direction of the relationship
(increase/decrease)
Test concerning
The crucial statistic is the Wald statistic which has a chi-square distribution and tells uswhether the coefficient for that predictor is significantly different from zero. If thecoefficient is significantly different from zero then we assume that the predictor is making a
significant contribution to the prediction of the outcome (). In this sense it is analogous tothe -tests found in multiple regression.3
Walds p-valueif Walds p-value
-
7/28/2019 Chapter 16 - Logistic Regression Model
6/7
LOGISTIC REGRESSION MODEL
6
I. Omnibus Tests of Model Coefficients gives us an overall indication of how well themodel performs compared with model with none of the predictors entered into the
model. For this results, we want a highly significant value (p-value must be less than
0.05)
II. Hosmer-Lemeshow test will be used to test the goodness-of-fit of the modelIII. Cox & Snell R-square and Nagelkerke R-square values provide an indication of the
amount of variation in the dependent variable explained by the model.
IV. The Classification table was also used in this study to know how well the model isable to predict the correct category. This table also provides the sensitivity and
specificity of the model. Sensitivity measures the proportion of actual positives
which are correctly identified, whereas Specificity measures the proportion of
negative which are correctly identified. A model with high percentage of sensitivity
and low in specificity are good and can be used for prediction.
The logistic regression model is a good fit for the data The logistic regression model is not a good fit for the data Chi-square statistic or p-valueif p-value , accept .since p-value (0.511) > , accept . We can conclude that the logisticregression model is good fit for the data.
The R-square values suggested that this model can explain about 15.7 to 24.6 percent of the
total variation in the dependent variable.
Example of Sensitivity and Specificity Analysis
-
7/28/2019 Chapter 16 - Logistic Regression Model
7/7
LOGISTIC REGRESSION MODEL
7
Overall predictive efficiency = 74.1%
Sensitivity (actual positives which are correctly identified) = 141/170 = 0.8294 or 82.9%
Specificity (actual negative which are correctly identified) = 54/93 = 0.5806 or 58.1%
Based on these results, we can conclude that Logistic Regression Model for Bahasa Inggeris
can be used to predict Form Four students achievement.