Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912...

27
Logistic regression Who survived Titanic?
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    228
  • download

    0

Transcript of Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912...

Page 1: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

Logistic regression

Who survived Titanic?

Page 2: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

2

The sinking of Titanic

Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers survived. Who survived?

Page 3: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

3

The data

Sibsp is the number of siblings and/or spouses accompanyingParsc is the number of parents and/or children accompanying Some values are missingCan we predict who will survive titanic II?

pclass survived name sex age sibsp parch

1 1 Allen, Miss. Elisabeth Walton female 29 0 0

1 1 Allison, Master. Hudson Trevor male 0.9167 1 2

1 0 Allison, Miss. Helen Loraine female 2 1 2

1 0 Allison, Mr. Hudson Joshua Creighton male 30 1 2

1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25 1 2

1 1 Anderson, Mr. Harry male 48 0 0

1 1 Andrews, Miss. Kornelia Theodosia female 63 1 0

1 0 Andrews, Mr. Thomas Jr male 39 0 0

1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53 2 0

Carsten D. Mørch
Bliver det behandlet?
Page 4: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

4

Analyzing the data in a (too) simple manner

• Associations between factors without considering interactions

Page 5: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

5

Analyzing the data in a (too) simple manner

• Associations between factors without considering interactions

Page 6: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

6

Analyzing the data in a (too) simple manner

• Associations between factors without considering interactions

Page 7: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

7

Analyzing the data in a (too) simple manner

• Associations between factors without considering interactions

Page 8: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

8

Analyzing the data in a (too) simple manner

• Associations between factors without considering interactions

Page 9: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

9

Could we use multiple linear regression to predict survival?

0 1 1( ) ... n nE y x x

multiple linear regression Logistic regression

Response variable is defined between –inf and +inf

Response variable is defined between 0 and 1

Normal distributed Bernoulli distributed

Page 10: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

10

Logit transformation is modeled linearly

The logistic function

0 1 1

0 1 1

0 1 1 0 1 1

ln ...1

exp ... 1

1 exp ... 1 exp ...

n n

n n

n n n n

px x

p

x xp

x x x x

Page 11: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

11

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

Page 12: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

12

The sigmodal curve

• The intercept basically just ‘scale’ the input variable

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 2;

1 = 1

0 = -2;

1 = 1

Page 13: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

13

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 0;

1 = 2

0 = 0;

1 = 0.5

• The intercept basically just ‘scale’ the input variable

• Large regression coefficient → risk factor strongly influences the probability

Page 14: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

14

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 0;

1 = -1

• The intercept basically just ‘scale’ the input variable

• Large regression coefficient → risk factor strongly influences the probability

• Positive regression coefficient → risk factor increases the probability

Page 15: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

15

Logistic regression of the Titanic data

Page 16: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

16

Logistic regression of the Titanic data

1. Summary of data2. Coding of the dependent

variable3. Coding of the categorical

explanatory variable:First class: 1Second class: 2Third class: reference

Page 17: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

17

Logistic regression of the Titanic data

A fit of the null-model, basically just the intercept. Usually not interesting

• The total probability of survival is 500/1309 = 0.382. Cutoff is 0.5 so all are classified as non-survivers.

• Basically tests if the null-model is sufficient. It almost certainly is not.

• Shows that survival is related to pclass (which is not in the null-model)

Page 18: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

18

Logistic regression of the Titanic data

1. Omnibus test: Uses LR to describe if the adding the pclass variable to the model makes it better. It did! But better than the null-model, so no surprise.

2. Model Summary. Other measures of the goodness of fit.

3. Classification table: By including pclass 67.7 passengers were correctly categorized.

4. Variables in the equation: first line repeats that pclass has a significant effect on survival. B is the logistic fittet parameter. Exp(B) is the odds rations, so the odds of survival is 4.7 (3.6-6.3) times higher than passengers on third class (reference class)

Page 19: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

19

Logistic regression of the Titanic data now adding family relations

1. ‘3 or more’ is set as reference groups by SPSS

Page 20: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

20

Logistic regression of the Titanic data now adding family relations

1. The model correctly classify 79.1% of the passengers

Page 21: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

21

Logistic regression of the Titanic data now adding family relations

1. Basically all factors seems to affect the probability of survival.

Page 22: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

22

How was it with age?

• Linear associations are easy to model, because the factor enters the predictive value directly.

• But it is not really look linear, maybe a third order polynomial?

• Three new factors for age is calculated: first, second, and third order of the age divided by the standard diviation.

Page 23: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

23

How was it with age?

•The third-order age factor did not add significantly to the model. •By adding third order polynomial the model can correctly categorize 79.4

vs 79.1 before.•ParChild is no longer a significant factor and can be omitted from the

model

Page 24: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

24

Using the model to predict survival

• Omitting the second and third order age and ParChild factors

• What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic?

z = -3.929-0.589*(-5)/14.41+1.718+2.552+0.926 = 1.47141.4714

1 1

1 e 1 e0.8133

zp

Page 25: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

25

Analysing interaction of selected factors

pclass * sex, age * sex, pclass * Siblings/ParentsBut the model does not converge…

Page 26: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

26

Analysing interaction of selected factors

Collapsing the sibling/spouse number eradicated their mutual interaction

Page 27: Logistic regression Who survived Titanic?. 2 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers.

27

Is it realistic that Leonardo survives and the chick dies?