Predicting survival on the Titanic. · On April 15, 1912, during her maiden voyage, the Titanic...

Post on 23-Mar-2020

10 views 0 download

Transcript of Predicting survival on the Titanic. · On April 15, 1912, during her maiden voyage, the Titanic...

IntroductionThe Model

Results

Predicting survival on the Titanic.

Moderator: C. Fodya.

Group Members:

M. Durojaye, R. Rakotonirainy, S. Shabalala, A. Akinyelu, D.Raphulu, S. Simelane.

Graduate Student Workshop.

January 11, 2014

Titanic MISG 2014

IntroductionThe Model

Results

Outline

1 Introduction

2 The ModelApproachAnalysis

3 Results

Titanic MISG 2014

IntroductionThe Model

Results

Figure: The sinking Titanic (Photo: D. Paris)

Titanic MISG 2014

IntroductionThe Model

Results

Introduction

On April 15, 1912, during her maiden voyage, the Titanicsank after colliding with an iceberg.1502 out of 2224 passengers and crew died.This sensational tragedy led to better safety regulations forships.

Titanic MISG 2014

IntroductionThe Model

Results

Introduction cont...

One of the reasons that the shipwreck led to such loss oflife was that there were not enough lifeboats for thepassengers and crew.Some groups of people were more likely to survive thanothers, such as women, children, and the upper-class.

Titanic MISG 2014

IntroductionThe Model

Results

Figure: Schematic of the Titanic (Photo: D. Raphulu)

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Material

We deal with two datasets, training data and testing data.For the training set, all information for each passenger isgiven.

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Below we have our test data set with the empty survivalcolumn.

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Main Purpose

Our main aim is to fill up the survival column of the testdata set.

How?finding patterns and building models from the training data.prediction

Tools and algorithmsPython, Excel and C#

Random forest is the machine learning algorithm used.Testing

Model accuracy was done by submission to the Kagglecompetition.

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Variable Rank

We ranked each variable according to the correlationbetween itself and survival.

Variable Corr − CoefGender 0.5434PClass 0.3385Cabin 0.3196Fare 0.257

Embark 0.1018Parch 0.0816Age 0.0772

SibSp 0.0353

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Gender

PClass

Fare

Cabin

Embark

M

F

1st

2nd

3rd

a

b

cd

yes no

S

CQ

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Single Variable

Predicting survival using gender.

Gender SurvivedWomen 0.74

Men 0.18

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Three variables combination

Female

0 − 9 10 − 19 20 − 29 > 301st class 0 0 0.8 0.972nd class 0 0.91 0.9 13rd class 0.59 0.58 0.3 0.12

Male

0 − 9 10 − 19 20 − 29 > 301st class 0 0 0.4 0.382nd class 0 0.15 0.16 0.213rd class 0.11 0.23 0.13 0.24

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Four variables combination

Female + Cabin Crew

0 − 9 10 − 19 20 − 29 > 301st class 0 0 0 1.2nd class 0 0.92 0.9 1.3rd class 0.59 0.57 0.3 0.125

Female + Not Cabin Crew

0 − 9 10 − 19 20 − 29 > 301st class 0 0 0.83 0.972nd class 0 0.88 0 13rd class 0 0.6 1 0.

Titanic MISG 2014

IntroductionThe Model

Results

ApproachAnalysis

Cont...

Male + Cabin crew

0 − 9 10 − 19 20 − 29 > 301st class 0 0 0.33 0.152nd class 0 0.14 0.09 0.153rd class 0.11 0.21 0.125 0.24

Male + Non Cabin crew

0 − 9 10 − 19 20 − 29 > 301st class 0 0 0.44 0.422nd class 0 0.5 0.66 13rd class 0.2 1 0 0.

Titanic MISG 2014

IntroductionThe Model

Results

Results

Measuring the model accuracy

Variables Model accuracy Random forestGender 77.1 % 77.14 %

Gender + Pclass 76.0 % 76.02 %3 77.9 % 77.93 %4 76.5 % 76.51 %5 69.1 % 69.17 %

3 = Gender + Pclass + Fare4 = Gender + Pclass + Fare + Cabin5 = Gender + Pclass + Fare + Cabin + Embark

Titanic MISG 2014

IntroductionThe Model

Results

Discussion

A simple model is not always a bad model.Building a sophisticated model (by adding too manyvariables) might not improve the prediction accuracy of themodel.A moderate model (not too simple and not too complex) issufficient for developing a robust prediction system.

Titanic MISG 2014

IntroductionThe Model

Results

Thank you!!!!Any questions are most

welcome!!!!

Titanic MISG 2014

IntroductionThe Model

Results

Thank you!!!!Any questions are most

welcome!!!!

Titanic MISG 2014