Logistic regression

20
Logistic Logistic regression regression Dr. Khaled Mahmoud Abd Dr. Khaled Mahmoud Abd Elaziz Elaziz Lecturer of public health and Lecturer of public health and preventive preventive medicine, Faculty of medicine, Faculty of Medicine- Medicine- Ain Shams University Ain Shams University

description

Logistic regression made easy

Transcript of Logistic regression

Page 1: Logistic regression

Logistic regressionLogistic regression Dr. Khaled Mahmoud Abd ElazizDr. Khaled Mahmoud Abd Elaziz

Lecturer of public health and preventiveLecturer of public health and preventive medicine, Faculty of Medicine- medicine, Faculty of Medicine-

Ain Shams UniversityAin Shams University

Page 2: Logistic regression

Logistic regression is very similar to linear Logistic regression is very similar to linear regression.regression.

When we use logistic regression?When we use logistic regression?We use it when we have a (binary outcome) of We use it when we have a (binary outcome) of

interest and a number of explanatory interest and a number of explanatory variables.variables.

Outcome:Outcome:e.g. the presence of absence of a symptom, e.g. the presence of absence of a symptom,

presence or absence of a diseasepresence or absence of a disease

Page 3: Logistic regression

Logistic regression is very similar to linear Logistic regression is very similar to linear regression.regression.

When we use logistic regression?When we use logistic regression?We use it when we have a (binary outcome) of We use it when we have a (binary outcome) of

interest and a number of explanatory interest and a number of explanatory variables.variables.

Outcome:Outcome:e.g. the presence of absence of a symptom, e.g. the presence of absence of a symptom,

presence or absence of a diseasepresence or absence of a disease

Page 4: Logistic regression

From the equation of the logistic From the equation of the logistic regression model we can do:regression model we can do:

1-we can determine which explanatory 1-we can determine which explanatory variables can influence the outcome.variables can influence the outcome.

Which means which variables had the Which means which variables had the highest OR or the risk in production of highest OR or the risk in production of the outcomethe outcome

(1= has the disease 0= doesn’t have the (1= has the disease 0= doesn’t have the disease)disease)

Page 5: Logistic regression

From the equation of the logistic From the equation of the logistic regression model we can do:regression model we can do:

2- we can use an individual 2- we can use an individual values of the explanatory values of the explanatory variables to evaluate he or she variables to evaluate he or she will have a particular outcomewill have a particular outcome

Page 6: Logistic regression

we start the logistic regression model by we start the logistic regression model by creating a binary variable to represent creating a binary variable to represent the outcome (Dependant variable) (1= the outcome (Dependant variable) (1= has the disease 0=doesn’t have the has the disease 0=doesn’t have the disease)disease)

We take the probability P of an individual We take the probability P of an individual has the highest coded category (has has the highest coded category (has the disease) as the dependant variable.the disease) as the dependant variable.

We use the logit logistic transformation in We use the logit logistic transformation in the regression equationthe regression equation

Page 7: Logistic regression

The logit is the natural logarithm of the odds ratio of The logit is the natural logarithm of the odds ratio of ‘disease’‘disease’

Logit (P)= ln P/ 1-pLogit (P)= ln P/ 1-pThe logistic regression equationThe logistic regression equationLogit (p)= a + bLogit (p)= a + b11XX11+ b+ b22XX22 + b + b33XX3 3 +……… + b+……… + biiXXii

X= Explanatory variablesX= Explanatory variablesP= estimated value of true probability that an P= estimated value of true probability that an

individual with a particular set of values for X has individual with a particular set of values for X has the disease. P corresponds to the proportion with the disease. P corresponds to the proportion with the disease, it has underlying binominal the disease, it has underlying binominal distributiondistribution

b= estimated logistic regression coefficients b= estimated logistic regression coefficients The exponential of a particular coefficient for The exponential of a particular coefficient for

example eb1 is an estimated of the odds ratio. example eb1 is an estimated of the odds ratio.

Page 8: Logistic regression

For a particular value of X1 the estimated For a particular value of X1 the estimated odds of the disease while adjusting for all odds of the disease while adjusting for all other X’s in the equation.other X’s in the equation.

As the logistic regression is fitted on a log As the logistic regression is fitted on a log scale the effects of X’s are multiplicative on scale the effects of X’s are multiplicative on the odds of the disease . This means that the odds of the disease . This means that their combined effect is the product of their their combined effect is the product of their separate effects.separate effects.

This is unlike linear regression where the This is unlike linear regression where the effects of X’s on the dependant variables effects of X’s on the dependant variables are additive.are additive.

Page 9: Logistic regression

Plain English:Plain English:

1-Take the significant variables in the 1-Take the significant variables in the univariate analysisunivariate analysis

2-Set the P value that you will take those 2-Set the P value that you will take those variables to be put in the models e.g. 0.05 or variables to be put in the models e.g. 0.05 or 0.1 0.1

3-if all variables in the univariate analysis are 3-if all variables in the univariate analysis are insignificant ? Don’t bother doing logisitic insignificant ? Don’t bother doing logisitic regression. There is no question here about regression. There is no question here about those variables for prediction of the diseasethose variables for prediction of the disease

Page 10: Logistic regression

Plain English:Plain English:

4- the idea of doing a logisitic regression we have 4- the idea of doing a logisitic regression we have two many variables that are significant with the two many variables that are significant with the outcome we are looking for and we want to know outcome we are looking for and we want to know which is more stronger in prediction of the disease which is more stronger in prediction of the disease outcomeoutcome

5- we look in the output of the statistical program for 5- we look in the output of the statistical program for Odds ratio and CI, significance of the variable, Odds ratio and CI, significance of the variable, manipulate to select of the best combination of manipulate to select of the best combination of explanatory variablesexplanatory variables

Page 11: Logistic regression

Plain English:Plain English:

4- the idea of doing a logisitic regression we have 4- the idea of doing a logisitic regression we have two many variables that are significant with the two many variables that are significant with the outcome we are looking for and we want to know outcome we are looking for and we want to know which is more stronger in prediction of the disease which is more stronger in prediction of the disease outcome outcome

Mathematical model that describes the relationship Mathematical model that describes the relationship between an outcome with one or more explanatory between an outcome with one or more explanatory variablesvariables

5- we look in the output of the statistical program for 5- we look in the output of the statistical program for Odds ratio and CI, significance of the variable, Odds ratio and CI, significance of the variable, manipulate to select of the best combination of manipulate to select of the best combination of explanatory variablesexplanatory variables

Page 12: Logistic regression

Example:Example:A study was done to test the relationship A study was done to test the relationship

between HHV8 infection and sexual between HHV8 infection and sexual behavior of men, were asked about histories behavior of men, were asked about histories of sexually transmitted diseases in the past ( of sexually transmitted diseases in the past ( gonorrhea, syphilis, HSV2, and HIV)gonorrhea, syphilis, HSV2, and HIV)

The explanatory variables were the presence The explanatory variables were the presence of each of the four infection coded as 0 if the of each of the four infection coded as 0 if the patient has no history or 1 if the patient had patient has no history or 1 if the patient had a history of that infection and the patient age a history of that infection and the patient age in yearsin years

Page 13: Logistic regression

Dependant outcome HHV8 infectionDependant outcome HHV8 infectionParameter

estimateP OR 95% CI

InterceptIntercept -2.2242 0.006Gonorrhea 0.5093 0.243 1.664 0.71-3.91

Syphilis 1.1924 0.093 3.295 0.82-13.8

HSV2 0.7910 0.0410 2.206 1.03-4.71

HIV 1.6357 0.0067 5.133 1.57-16.73

Age 0.0062 0.76 1.006 0.97-1.05

Page 14: Logistic regression

Example:Example:

Chi square for covariate= 24.5 P=0.002Chi square for covariate= 24.5 P=0.002

Indicating at least one of the covariates Indicating at least one of the covariates is significantly associated with HHV-8 is significantly associated with HHV-8 serostatus.serostatus.

HSV-2 positively associated with HHV8 HSV-2 positively associated with HHV8 infection P=0.04infection P=0.04

HIV is positively associated with HHV 8 HIV is positively associated with HHV 8 infection P=0.007infection P=0.007

Page 15: Logistic regression

Those with a history of HSV-2 having 2.21 Those with a history of HSV-2 having 2.21 times odds of being HHV-8 positive times odds of being HHV-8 positive compared to those with negative history compared to those with negative history after adjusting for other infectionsafter adjusting for other infections

Those with a history of HIV having 5.1 times Those with a history of HIV having 5.1 times odds of being HHV-8 positive compared to odds of being HHV-8 positive compared to those with negative history after adjusting those with negative history after adjusting for other infectionsfor other infections

Page 16: Logistic regression

Multiplicative effect of the model suggests a Multiplicative effect of the model suggests a man who is both HSV2 and HIV seropositive man who is both HSV2 and HIV seropositive is estimated to have 2.206 X 5.133 = 11.3 is estimated to have 2.206 X 5.133 = 11.3 times the odds of HHV 8 infection compared times the odds of HHV 8 infection compared to a man negative for both after adjusting for to a man negative for both after adjusting for the other two infections.the other two infections.

In this example gonorrhea had a significant In this example gonorrhea had a significant chi-square but when entered in the model it chi-square but when entered in the model it was not significantwas not significant

(no indication of independent relationship (no indication of independent relationship between a history of gonorrhea and HHV8 between a history of gonorrhea and HHV8 seropositivity)seropositivity)

Page 17: Logistic regression

There is no significant relationship There is no significant relationship between HHV8 seropositivity and age, between HHV8 seropositivity and age, the odds ratio indicates that the the odds ratio indicates that the estimated odds of HHV8 seropositivity estimated odds of HHV8 seropositivity increases by 0.6% for each additional increases by 0.6% for each additional year of age.year of age.

Page 18: Logistic regression

What is the probability of 51 year old man has What is the probability of 51 year old man has HHV8 infection if he has gonorrhea positive HHV8 infection if he has gonorrhea positive and HSV2 positive but doesn’t have the two and HSV2 positive but doesn’t have the two other diseases (Syphilis and HIV)?other diseases (Syphilis and HIV)?

Add up the regression coefficientsAdd up the regression coefficients

Constant +b1 +b2 +b3X ageConstant +b1 +b2 +b3X age

-2.2242 + 0.5093+0.7910+ (0.0062X51)=-2.2242 + 0.5093+0.7910+ (0.0062X51)=

-0.6077-0.6077

Page 19: Logistic regression

probability of this person=probability of this person=

P= eP= ez z / 1+ e/ 1+ ezz

P= e (-0.6077)/ 1+ e (-0.6077) =0.35P= e (-0.6077)/ 1+ e (-0.6077) =0.35

Page 20: Logistic regression

THANK YOU