Chapter 16 logistic RegressionAnalysis
ContentLogistic regression Conditional logistic regression Application
Purpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variables (risk factors). Logistic regression is a kind of nonlinear regression.Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no. 2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categorical variable should be quantified.
Implication: Logistic regression can be used to study the quantitative relations between the happening of some diseases or phenomena and many risk factors. There are some demerits to use test (or u test ): 1. can only study one risk factor. 2. can only educe the qualitative conclusion.
Category:1.Between-subjects (non-conditional) logistic regression equation2. Paired (conditional) logistic regression equation
1 logistic regression non-conditional logistic regression
I Basic ConceptionThe probability of positive outcome under the function of m independent variables can be marked like this:
If: Regression modelProbability: P01logitP Scale:
While
is the constant term
is the coefficient of regression
_1076792764.unknown
_1076792910.unknown
1
0.01799
0.01889
0.01984
0.02084
0.02188
0.02298
0.02413
0.02533
0.0266
0.02792
0.02931
0.03077
0.0323
0.0339
0.03557
0.03733
0.03917
0.04109
0.04311
0.04522
0.04743
0.04974
0.05215
0.05468
0.05732
0.06009
0.06297
0.06599
0.06914
0.07243
0.07586
0.07944
0.08317
0.08707
0.09112
0.09535
0.09975
0.10433
0.1091
0.11405
0.1192
0.12455
0.13011
0.13587
0.14185
0.14805
0.15447
0.16111
0.16798
0.17509
0.18243
0.19
0.19782
0.20587
0.21417
0.2227
0.23148
0.24049
0.24974
0.25923
0.26894
0.27888
0.28905
0.29943
0.31003
0.32082
0.33181
0.34299
0.35434
0.36586
0.37754
0.38936
0.40131
0.41338
0.42556
0.43782
0.45017
0.46257
0.47502
0.4875
0.5
0.5125
0.52498
0.53743
0.54983
0.56218
0.57444
0.58662
0.59869
0.61064
0.62246
0.63414
0.64566
0.65701
0.66819
0.67918
0.68997
0.70057
0.71095
0.72112
0.73106
0.74077
0.75026
0.75951
0.76852
0.7773
0.78583
0.79413
0.80218
0.81
0.81757
0.82491
0.83202
0.83889
0.84553
0.85195
0.85815
0.86413
0.86989
0.87545
0.8808
0.88595
0.8909
0.89567
0.90025
0.90465
0.90888
0.91293
0.91683
0.92056
0.92414
0.92757
0.93086
0.93401
0.93703
0.93991
0.94268
0.94532
0.94785
0.95026
0.95257
0.95478
0.95689
0.95891
0.96083
0.96267
0.96443
0.9661
0.9677
0.96923
0.97069
0.97208
0.9734
0.97467
0.97587
0.97702
0.97812
0.97916
0.98016
0.98111
0.98201
1
0.5
P
Z
Sheet1
-40.01799
-3.950.01889
-3.90.01984
-3.850.02084
-3.80.02188
-3.750.02298
-3.70.02413
-3.650.02533Z
-3.60.0266
-3.550.02792
-3.50.02931
-3.450.03077
-3.40.0323
-3.350.0339
-3.30.03557
-3.250.03733
-3.20.03917
-3.150.04109
-3.10.04311
-3.050.04522
-30.04743
-2.950.04974
-2.90.05215
-2.850.05468
-2.80.05732
-2.750.06009
-2.70.06297
-2.650.06599
-2.60.06914
-2.550.07243
-2.50.07586
-2.450.07944
-2.40.08317
-2.350.08707
-2.30.09112
-2.250.09535
-2.20.09975
-2.150.10433
-2.10.1091
-2.050.11405
-20.1192
-1.950.12455
-1.90.13011
-1.850.13587
-1.80.14185
-1.750.14805
-1.70.15447
-1.650.16111
-1.60.16798
-1.550.17509
-1.50.18243
-1.450.19
-1.40.19782
-1.350.20587
-1.30.21417
-1.250.2227
-1.20.23148
-1.150.24049
-1.10.24974
-1.050.25923
-10.26894
-0.950.27888
-0.90.28905
-0.850.29943
-0.80.31003
-0.750.32082
-0.70.33181
-0.650.34299
-0.60.35434
-0.550.36586
-0.50.37754
-0.450.38936
-0.40.40131
-0.350.41338
-0.30.42556
-0.250.43782
-0.20.45017
-0.150.46257
-0.10.47502
-0.050.4875
00.5
0.050.5125
0.10.52498
0.150.53743
0.20.54983
0.250.56218
0.30.57444
0.350.58662
0.40.59869
0.450.61064
0.50.62246
0.550.63414
0.60.64566
0.650.65701
0.70.66819
0.750.67918
0.80.68997
0.850.70057
0.90.71095
0.950.72112
10.73106
1.050.74077
1.10.75026
1.150.75951
1.20.76852
1.250.7773
1.30.78583
1.350.79413
1.40.80218
1.450.81
1.50.81757
1.550.82491
1.60.83202
1.650.83889
1.70.84553
1.750.85195
1.80.85815
1.850.86413
1.90.86989
1.950.87545
20.8808
2.050.88595
2.10.8909
2.150.89567
2.20.90025
2.250.90465
2.30.90888
2.350.91293
2.40.91683
2.450.92056
2.50.92414
2.550.92757
2.60.93086
2.650.93401
2.70.93703
2.750.93991
2.80.94268
2.850.94532
2.90.94785
2.950.95026
30.95257
3.050.95478
3.10.95689
3.150.95891
3.20.96083
3.250.96267
3.30.96443
3.350.9661
3.40.9677
3.450.96923
3.50.97069
3.550.97208
3.60.9734
3.650.97467
3.70.97587
3.750.97702
3.80.97812
3.850.97916
3.90.98016
3.950.98111
40.98201
Sheet1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0.5
P
Z
Sheet2
Sheet3
The meaning of model parameter By constant we mean the natural logarithm of likelihood ratio between happening and non-happening when exposure dose is zero. By regression coefficient we mean the change of logitP when the independent variable changes by one unit.
The statistical indicator--odds ratio which is used to measure the function of risk factor in the epidemiology ,the formula of computation is: Odds ratio (OR)
In the formula ,
is the incidence of a disease when
is
,and
is the incidence of a disease when
is
.
is called odds ratio when many variables had been adjusted, it show the function of the risk factors without the influence of the other independent variables.
_1077608169.unknown
_1208683399.unknown
_1208683450.unknown
_1208683486.unknown
_1208683415.unknown
_1208683343.unknown
_1077608162.unknown
The relationship with logistic P
Comparing the conditions of disease when one risk factor has two different exposure levels (
,
), the natural logarithm of Odds Ratio is:
_1077608069.unknown
_1077608080.unknown
We often think that
is an ineffective parameter, because there is no relationship between
and
.
_1208781326.unknown
_1208781372.unknown
_1208781297.unknown
II the parametric estimation of logistic regression model parametric estimation Theorythe estimation of likelihood
_1077608381.unknown
_1079443675.unknown
_1081064387.unknown
_1077608256.unknown
2.Estimation of OR It can show the OR of two different levels c1c0 of one factor.
_1077608162.unknown
_1077608169.unknown
If the independent variable
only has two levelsthe exposure and the non- exposure, the estimate formula of
confidence interval of
is:
_958218068.unknown
_1077608532.unknown
_952641395.unknown
e.g.: 16-1 Table 16-1 is a case-control data which is used to study the relations among smokingdrinking and esophagus cancer, please try running logistic regression analysis. Definite every variables code
Table16-1 the case-control data of the relation between smoking and esophagus cancer
stratification
smoking
drinking
case
positive
negative
g
X1
X2
ng
dg
ng( dg
1
0
0
199
63
136
2
0
1
170
63
107
3
1
0
101
44
57
4
1
1
416
265
151
Results:95 confidence interval of95 confidence interval of
The OR of smoking and nonsmoking The OR of drinking and no drinking
logistic
=-0.9099
=0.1358
=0.8856
=0.1500
=0.5261
=0.1572
95(:
:
95(:
_958218848.unknown
_1081064688.unknown
_1101607359.unknown
_1101607403.unknown
_1101607439.unknown
_1081064747.unknown
_1079156530.unknown
_1079156560.unknown
_1078082832.unknown
_958218846.unknown
_958218847.unknown
_958218767.unknown
III the hypothesis test of logistic regression model1. Likelihood test2. Wald test comparing the estimations of parameters with zero, the control is its standard error , statistics are:
Both of are more than 3.84, that is to say that esophagus cancersmoking and drinking have relations with each other. The conclusion is same as above.
methodsforward selectionbackward elimination and stepwise regression .Test statisticsit is not F statisticbut one of likelihood Wald test and score test statistics.IV variable selectione.g.: 16-2 In order to discuss the risk factors that relate to coronary heart disease, to take case-control study on 26 coronary heart disease patients and 28 controllers, table 16-2 and table 16-3 show the definition of all factors and the data. Please try using logistic stepwise regression to select the risk factors.
Table 16-2 eight probable risk factors of coronary heart disease and valuation
factors
variables
Definition of valuation
Age
X1
Table 16-3 the case-control data of heart diseases risk factors
Order
X1
X2
X3
X4
X5
X6
X7
X8
Y
1
3
1
0
1
0
0
1
1
0
2
2
0
1
1
0
0
1
0
0
3
2
1
0
1
0
0
1
0
0
4
2
0
0
1
0
0
1
0
0
5
3
0
0
1
0
1
1
1
0
6
3
0
1
1
0
0
2
1
0
7
2
0
1
0
0
0
1
0
0
8
3
0
1
1
1
0
1
0
0
9
2
0
0
0
0
0
1
1
0
10
1
0
0
1
0
0
1
0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
2
0
1
1
0
1
2
1
1
52
2
1
1
1
0
0
2
1
1
53
2
1
0
1
0
0
1
1
1
54
3
1
1
0
1
0
3
1
1
Table 16-4 e.g.16-2 the independent variables which are entering equation and estimations of related parameters Learn how to see the results
Model
Coefficient of regression (b)
Standard error(
)
Wald
P
Standard coefficient of regression(b)
constant
-4.705
1.543
9.30
0.0023
--
--
X1
0.924
0.477
3.76
0.0525
0.401
2.52
X5
1.496
0.744
4.04
0.0443
0.406
4.46
X6
3.136
1.249
6.30
0.0121
0.703
23.00
X8
1.947
0.847
5.29
0.0215
0.523
7.01
_952031676.unknown
_1079431735.unknown
_949249507.unknown
Finally there are four risk factors entering the logistic regression model, they are rising age
history of high blood lipid
animal fat intake
and type A character
Standard coefficient of regression
can be used to compare the importance of every factor
is standard error of
, (=3.1416
_1208758035.unknown
_1208758063.unknown
_1208758089.unknown
_1208758104.unknown
_1208758075.unknown
_1208758051.unknown
_952546540.unknown
ContentLogistic regression Conditional logistic regression Application
I Principle 2 conditional logistic regression
In the paired data, one case and several controls in each group is the most commonly method, that is 1: M paired studyusually
_952611537.unknown
Table 16-5 the data format of 1: M conditional logistic regression * t = 0 is the case and the others are the control.
Matched group
Number in group
Dependent variable
Risk factors
i
t
Y
X1
X2
Xm
1
0
1
X101
X 102
X 10m
1
0
X 111
X 112
X 11m
2
0
X 121
X 122
X 12m
M
0
X 1M1
X 1M2
X 1Mm
n
0
1
Xn01
X n02
X n0m
1
0
X n11
X n12
X n1m
2
0
X n21
X n22
X n2m
M
0
X nM1
X nM2
X nMm
_1079161811.unknown
_1079161851.unknown
_1079161794.unknown
The model of conditional logistic
means the disease probability of the layer i under the function of a group of risk factors
means the effect of every layer,
are the parameter to estimate.
_956617457.unknown
_958220609.unknown
_952111582.unknown
The difference with the model of non-conditional logistic regression is constant, the
can be different from each other, but they assume that the ability of causing diseases is the same among different paired groups.
_952111582.unknown
II applied example
e.g.16-3 Some study about risk factors of larynx cancer in a northern cityit used1:2 paired case-control method. Now 6 probable risk factors and 25 paired data have been selected, the valuation is in the following table 16-6, and the data is in table 16-7.
_1208758149.unknown
_1208933214.unknown
Table 16-6 the risks of larynx cancer and explanation of valuation
Factors
variables
Explanation of valuation
pharyngitis
X1
no=1, occasion=2, often=3
smoking(cig/day)
X2
0=1, 1(4=2, 5(9=3, 10(20=4, 20(=5
hoarseness
X3
no=1, occasion =2, often=3
Fresh vegetables intake
X4
little=1, occasion=2, every day=3
Fruits intake
X5
rare =1, little=2, often=3
Family cancer history
X6
no=0, yes=1
larynx cancer
Y
case=1, control=0
Table 16-7 the data table of 1:2 paired case-control study about larynx cancer P344:
Table16-8 e.g.16-3 The Estimation of independent variables and related parameters which have entered the equation Using stepwise Six risk factors variable selection four factors enter equationTable16-9 shows the results
The four entered risk factors are smoking
hoarseness
whether often have fresh vegetable or not
and family cancers
in all of these, having fresh vegetable is a protecting factor
_952282733.unknown
_952645373.unknown
_952645391.unknown
_952282923.unknown
_952282543.unknown
Entering
variables
Coefficient of
regressionb
Standard
errorSb
Wald
P
X2
1.4869
0.5506
7.29
4.42
0.0069
X3
1.9166
0.9444
4.12
6.80
0.0424
X4
3.7641
1.8251
4.25
0.02
0.0392
X6
3.6321
1.8657
3.79
37.79
0.0516
_952031676.unknown
_1101368128.unknown
ContentLogistic regression Conditional logistic regression Application
I the application of logistic regression1The analysis of epidemiologic risk factors One feature of logistic regression is that the meaning of parameter is clear, so logistic regression is suitable for epidemiologic study. 3 the application of logistic regression and the notice
2Analysis of clinical experiment The goal of clinical experiment is to assess the effect of some drugs or cure methods, if there are some confounding factors, and they are not balance among teams, the final results will be wrong. So it is necessary to adjust these factors during the process of analysis. when dependent variable is binary, we can use logistic regression to analyze and get the adjusted results.
3Analyze doseresponse of drugs or poisons In the studies about doseresponse of some drugs or poisons, if the date is the logarithm of dose ,the Probability distribution close to normal. The distribution of normal function is very similar to logistic regression, then we can express their relation through the following model.(While P is the positive rate; X is dose.)
4Forecast and discrimination logistic regression is a model of probability so we can use it to predict the probability of something. For example in clinical we can discriminate the probability of some diseases under some index. please refer to the chapter 18 about discrimination.
II the notice of application of logistic regression
1The value form of variable (the same as chapter15)
2Sample size
the number of independent variable
3The evaluation of model
4Multi-category logistic regression
_952290181.unknown
summaryPurpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variable (risk factor). Logistic regression belong to probability type and nonlinear regression.Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no. 2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categories variable should be measure by number.
Implication: Logistic regression can be used to study the quantitative relations between the happening of some disease or phenomena and many risk factorsCategory:1.Between-subjects (non-conditional) logistic regression equation2. Paired (conditional) logistic regression equation
ThinkingIn order to analysis the influent factors of the rescue of AMI patients, a hospital collected five years data of AMI patients (there are many related factors ,this case only lists three ones for the limited space), which has 200 cases in total, the data has been shown in the following table, P=0 means successful rescueP=1 means deathX1=1 means shock before rescue X1=0 means no shock before rescue X2=1 means heart failure before rescue X2=0 means no heart failure before rescue X3=1 means that it has been more than 12 hours from the beginning of AMI symptom to rescue X3=0 means the time has not passed 12 hours. which analysis method is the best one? why? which result can we got
The data of the rescue risk factor of the AMI patients
P=0(successfully rescued)P=1(death)X1X2X3NX1X2X3N00035000400134001100101701040111901115100171006101610191106110611161116
Top Related