Download - 1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.

Chapter 16 logistic RegressionAnalysis

ContentLogistic regression Conditional logistic regression Application

Purpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variables (risk factors). Logistic regression is a kind of nonlinear regression.Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no. 2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categorical variable should be quantified.

Implication: Logistic regression can be used to study the quantitative relations between the happening of some diseases or phenomena and many risk factors. There are some demerits to use test (or u test ): 1. can only study one risk factor. 2. can only educe the qualitative conclusion.

Category:1.Between-subjects (non-conditional) logistic regression equation2. Paired (conditional) logistic regression equation

1 logistic regression non-conditional logistic regression

I Basic ConceptionThe probability of positive outcome under the function of m independent variables can be marked like this:

If: Regression modelProbability: P01logitP Scale:

While

is the constant term

is the coefficient of regression

_1076792764.unknown

_1076792910.unknown

1

0.01799

0.01889

0.01984

0.02084

0.02188

0.02298

0.02413

0.02533

0.0266

0.02792

0.02931

0.03077

0.0323

0.0339

0.03557

0.03733

0.03917

0.04109

0.04311

0.04522

0.04743

0.04974

0.05215

0.05468

0.05732

0.06009

0.06297

0.06599

0.06914

0.07243

0.07586

0.07944

0.08317

0.08707

0.09112

0.09535

0.09975

0.10433

0.1091

0.11405

0.1192

0.12455

0.13011

0.13587

0.14185

0.14805

0.15447

0.16111

0.16798

0.17509

0.18243

0.19

0.19782

0.20587

0.21417

0.2227

0.23148

0.24049

0.24974

0.25923

0.26894

0.27888

0.28905

0.29943

0.31003

0.32082

0.33181

0.34299

0.35434

0.36586

0.37754

0.38936

0.40131

0.41338

0.42556

0.43782

0.45017

0.46257

0.47502

0.4875

0.5

0.5125

0.52498

0.53743

0.54983

0.56218

0.57444

0.58662

0.59869

0.61064

0.62246

0.63414

0.64566

0.65701

0.66819

0.67918

0.68997

0.70057

0.71095

0.72112

0.73106

0.74077

0.75026

0.75951

0.76852

0.7773

0.78583

0.79413

0.80218

0.81

0.81757

0.82491

0.83202

0.83889

0.84553

0.85195

0.85815

0.86413

0.86989

0.87545

0.8808

0.88595

0.8909

0.89567

0.90025

0.90465

0.90888

0.91293

0.91683

0.92056

0.92414

0.92757

0.93086

0.93401

0.93703

0.93991

0.94268

0.94532

0.94785

0.95026

0.95257

0.95478

0.95689

0.95891

0.96083

0.96267

0.96443

0.9661

0.9677

0.96923

0.97069

0.97208

0.9734

0.97467

0.97587

0.97702

0.97812

0.97916

0.98016

0.98111

0.98201

1

0.5

P

Z

Sheet1

-40.01799

-3.950.01889

-3.90.01984

-3.850.02084

-3.80.02188

-3.750.02298

-3.70.02413

-3.650.02533Z

-3.60.0266

-3.550.02792

-3.50.02931

-3.450.03077

-3.40.0323

-3.350.0339

-3.30.03557

-3.250.03733

-3.20.03917

-3.150.04109

-3.10.04311

-3.050.04522

-30.04743

-2.950.04974

-2.90.05215

-2.850.05468

-2.80.05732

-2.750.06009

-2.70.06297

-2.650.06599

-2.60.06914

-2.550.07243

-2.50.07586

-2.450.07944

-2.40.08317

-2.350.08707

-2.30.09112

-2.250.09535

-2.20.09975

-2.150.10433

-2.10.1091

-2.050.11405

-20.1192

-1.950.12455

-1.90.13011

-1.850.13587

-1.80.14185

-1.750.14805

-1.70.15447

-1.650.16111

-1.60.16798

-1.550.17509

-1.50.18243

-1.450.19

-1.40.19782

-1.350.20587

-1.30.21417

-1.250.2227

-1.20.23148

-1.150.24049

-1.10.24974

-1.050.25923

-10.26894

-0.950.27888

-0.90.28905

-0.850.29943

-0.80.31003

-0.750.32082

-0.70.33181

-0.650.34299

-0.60.35434

-0.550.36586

-0.50.37754

-0.450.38936

-0.40.40131

-0.350.41338

-0.30.42556

-0.250.43782

-0.20.45017

-0.150.46257

-0.10.47502

-0.050.4875

00.5

0.050.5125

0.10.52498

0.150.53743

0.20.54983

0.250.56218

0.30.57444

0.350.58662

0.40.59869

0.450.61064

0.50.62246

0.550.63414

0.60.64566

0.650.65701

0.70.66819

0.750.67918

0.80.68997

0.850.70057

0.90.71095

0.950.72112

10.73106

1.050.74077

1.10.75026

1.150.75951

1.20.76852

1.250.7773

1.30.78583

1.350.79413

1.40.80218

1.450.81

1.50.81757

1.550.82491

1.60.83202

1.650.83889

1.70.84553

1.750.85195

1.80.85815

1.850.86413

1.90.86989

1.950.87545

20.8808

2.050.88595

2.10.8909

2.150.89567

2.20.90025

2.250.90465

2.30.90888

2.350.91293

2.40.91683

2.450.92056

2.50.92414

2.550.92757

2.60.93086

2.650.93401

2.70.93703

2.750.93991

2.80.94268

2.850.94532

2.90.94785

2.950.95026

30.95257

3.050.95478

3.10.95689

3.150.95891

3.20.96083

3.250.96267

3.30.96443

3.350.9661

3.40.9677

3.450.96923

3.50.97069

3.550.97208

3.60.9734

3.650.97467

3.70.97587

3.750.97702

3.80.97812

3.850.97916

3.90.98016

3.950.98111

40.98201

Sheet1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0.5

P

Z

Sheet2

Sheet3

The meaning of model parameter By constant we mean the natural logarithm of likelihood ratio between happening and non-happening when exposure dose is zero. By regression coefficient we mean the change of logitP when the independent variable changes by one unit.

The statistical indicator--odds ratio which is used to measure the function of risk factor in the epidemiology ,the formula of computation is: Odds ratio (OR)

In the formula ,

is the incidence of a disease when

is

,and

is the incidence of a disease when

is

.

is called odds ratio when many variables had been adjusted, it show the function of the risk factors without the influence of the other independent variables.

_1077608169.unknown

_1208683399.unknown

_1208683450.unknown

_1208683486.unknown

_1208683415.unknown

_1208683343.unknown

_1077608162.unknown

The relationship with logistic P

Comparing the conditions of disease when one risk factor has two different exposure levels (

,

), the natural logarithm of Odds Ratio is:

_1077608069.unknown

_1077608080.unknown

We often think that

is an ineffective parameter, because there is no relationship between

and

.

_1208781326.unknown

_1208781372.unknown

_1208781297.unknown

II the parametric estimation of logistic regression model parametric estimation Theorythe estimation of likelihood

_1077608381.unknown

_1079443675.unknown

_1081064387.unknown

_1077608256.unknown

2.Estimation of OR It can show the OR of two different levels c1c0 of one factor.

_1077608162.unknown

_1077608169.unknown

If the independent variable

only has two levelsthe exposure and the non- exposure, the estimate formula of

confidence interval of

is:

_958218068.unknown

_1077608532.unknown

_952641395.unknown

e.g.: 16-1 Table 16-1 is a case-control data which is used to study the relations among smokingdrinking and esophagus cancer, please try running logistic regression analysis. Definite every variables code

Table16-1 the case-control data of the relation between smoking and esophagus cancer

stratification

smoking

drinking

case

positive

negative

g

X1

X2

ng

dg

ng( dg

1

0

0

199

63

136

2

0

1

170

63

107

3

1

0

101

44

57

4

1

1

416

265

151

Results:95 confidence interval of95 confidence interval of

The OR of smoking and nonsmoking The OR of drinking and no drinking

logistic

=-0.9099

=0.1358

=0.8856

=0.1500

=0.5261

=0.1572

95(:

:

95(:

_958218848.unknown

_1081064688.unknown

_1101607359.unknown

_1101607403.unknown

_1101607439.unknown

_1081064747.unknown

_1079156530.unknown

_1079156560.unknown

_1078082832.unknown

_958218846.unknown

_958218847.unknown

_958218767.unknown

III the hypothesis test of logistic regression model1. Likelihood test2. Wald test comparing the estimations of parameters with zero, the control is its standard error , statistics are:

Both of are more than 3.84, that is to say that esophagus cancersmoking and drinking have relations with each other. The conclusion is same as above.

methodsforward selectionbackward elimination and stepwise regression .Test statisticsit is not F statisticbut one of likelihood Wald test and score test statistics.IV variable selectione.g.: 16-2 In order to discuss the risk factors that relate to coronary heart disease, to take case-control study on 26 coronary heart disease patients and 28 controllers, table 16-2 and table 16-3 show the definition of all factors and the data. Please try using logistic stepwise regression to select the risk factors.

Table 16-2 eight probable risk factors of coronary heart disease and valuation

factors

variables

Definition of valuation

Age

X1

Table 16-3 the case-control data of heart diseases risk factors

Order

X1

X2

X3

X4

X5

X6

X7

X8

Y

1

3

1

0

1

0

0

1

1

0

2

2

0

1

1

0

0

1

0

0

3

2

1

0

1

0

0

1

0

0

4

2

0

0

1

0

0

1

0

0

5

3

0

0

1

0

1

1

1

0

6

3

0

1

1

0

0

2

1

0

7

2

0

1

0

0

0

1

0

0

8

3

0

1

1

1

0

1

0

0

9

2

0

0

0

0

0

1

1

0

10

1

0

0

1

0

0

1

0

0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

51

2

0

1

1

0

1

2

1

1

52

2

1

1

1

0

0

2

1

1

53

2

1

0

1

0

0

1

1

1

54

3

1

1

0

1

0

3

1

1

Table 16-4 e.g.16-2 the independent variables which are entering equation and estimations of related parameters Learn how to see the results

Model

Coefficient of regression (b)

Standard error(

)

Wald

P

Standard coefficient of regression(b)

constant

-4.705

1.543

9.30

0.0023

--

--

X1

0.924

0.477

3.76

0.0525

0.401

2.52

X5

1.496

0.744

4.04

0.0443

0.406

4.46

X6

3.136

1.249

6.30

0.0121

0.703

23.00

X8

1.947

0.847

5.29

0.0215

0.523

7.01

_952031676.unknown

_1079431735.unknown

_949249507.unknown

Finally there are four risk factors entering the logistic regression model, they are rising age

history of high blood lipid

animal fat intake

and type A character

Standard coefficient of regression

can be used to compare the importance of every factor

is standard error of

, (=3.1416

_1208758035.unknown

_1208758063.unknown

_1208758089.unknown

_1208758104.unknown

_1208758075.unknown

_1208758051.unknown

_952546540.unknown

I Principle 2 conditional logistic regression

In the paired data, one case and several controls in each group is the most commonly method, that is 1: M paired studyusually

_952611537.unknown

Table 16-5 the data format of 1: M conditional logistic regression * t = 0 is the case and the others are the control.

Matched group

Number in group

Dependent variable

Risk factors

i

t

Y

X1

X2

Xm

1

0

1

X101

X 102

X 10m

1

0

X 111

X 112

X 11m

2

0

X 121

X 122

X 12m

M

0

X 1M1

X 1M2

X 1Mm

n

0

1

Xn01

X n02

X n0m

1

0

X n11

X n12

X n1m

2

0

X n21

X n22

X n2m

M

0

X nM1

X nM2

X nMm

_1079161811.unknown

_1079161851.unknown

_1079161794.unknown

The model of conditional logistic

means the disease probability of the layer i under the function of a group of risk factors

means the effect of every layer,

are the parameter to estimate.

_956617457.unknown

_958220609.unknown

_952111582.unknown

The difference with the model of non-conditional logistic regression is constant, the

can be different from each other, but they assume that the ability of causing diseases is the same among different paired groups.

_952111582.unknown

II applied example

e.g.16-3 Some study about risk factors of larynx cancer in a northern cityit used1:2 paired case-control method. Now 6 probable risk factors and 25 paired data have been selected, the valuation is in the following table 16-6, and the data is in table 16-7.

_1208758149.unknown

_1208933214.unknown

Table 16-6 the risks of larynx cancer and explanation of valuation

Factors

variables

Explanation of valuation

pharyngitis

X1

no=1, occasion=2, often=3

smoking(cig/day)

X2

0=1, 1(4=2, 5(9=3, 10(20=4, 20(=5

hoarseness

X3

no=1, occasion =2, often=3

Fresh vegetables intake

X4

little=1, occasion=2, every day=3

Fruits intake

X5

rare =1, little=2, often=3

Family cancer history

X6

no=0, yes=1

larynx cancer

Y

case=1, control=0

Table 16-7 the data table of 1:2 paired case-control study about larynx cancer P344:

Table16-8 e.g.16-3 The Estimation of independent variables and related parameters which have entered the equation Using stepwise Six risk factors variable selection four factors enter equationTable16-9 shows the results

The four entered risk factors are smoking

hoarseness

whether often have fresh vegetable or not

and family cancers

in all of these, having fresh vegetable is a protecting factor

_952282733.unknown

_952645373.unknown

_952645391.unknown

_952282923.unknown

_952282543.unknown

Entering

variables

Coefficient of

regressionb

Standard

errorSb

Wald

P

X2

1.4869

0.5506

7.29

4.42

0.0069

X3

1.9166

0.9444

4.12

6.80

0.0424

X4

3.7641

1.8251

4.25

0.02

0.0392

X6

3.6321

1.8657

3.79

37.79

0.0516

_952031676.unknown

_1101368128.unknown

I the application of logistic regression1The analysis of epidemiologic risk factors One feature of logistic regression is that the meaning of parameter is clear, so logistic regression is suitable for epidemiologic study. 3 the application of logistic regression and the notice

2Analysis of clinical experiment The goal of clinical experiment is to assess the effect of some drugs or cure methods, if there are some confounding factors, and they are not balance among teams, the final results will be wrong. So it is necessary to adjust these factors during the process of analysis. when dependent variable is binary, we can use logistic regression to analyze and get the adjusted results.

3Analyze doseresponse of drugs or poisons In the studies about doseresponse of some drugs or poisons, if the date is the logarithm of dose ,the Probability distribution close to normal. The distribution of normal function is very similar to logistic regression, then we can express their relation through the following model.(While P is the positive rate; X is dose.)

4Forecast and discrimination logistic regression is a model of probability so we can use it to predict the probability of something. For example in clinical we can discriminate the probability of some diseases under some index. please refer to the chapter 18 about discrimination.

II the notice of application of logistic regression

1The value form of variable (the same as chapter15)

2Sample size

the number of independent variable

3The evaluation of model

4Multi-category logistic regression

_952290181.unknown

summaryPurpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variable (risk factor). Logistic regression belong to probability type and nonlinear regression.Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no. 2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categories variable should be measure by number.

Implication: Logistic regression can be used to study the quantitative relations between the happening of some disease or phenomena and many risk factorsCategory:1.Between-subjects (non-conditional) logistic regression equation2. Paired (conditional) logistic regression equation

ThinkingIn order to analysis the influent factors of the rescue of AMI patients, a hospital collected five years data of AMI patients (there are many related factors ,this case only lists three ones for the limited space), which has 200 cases in total, the data has been shown in the following table, P=0 means successful rescueP=1 means deathX1=1 means shock before rescue X1=0 means no shock before rescue X2=1 means heart failure before rescue X2=0 means no heart failure before rescue X3=1 means that it has been more than 12 hours from the beginning of AMI symptom to rescue X3=0 means the time has not passed 12 hours. which analysis method is the best one? why? which result can we got

The data of the rescue risk factor of the AMI patients

P=0(successfully rescued)P=1(death)X1X2X3NX1X2X3N00035000400134001100101701040111901115100171006101610191106110611161116