Term Project (Titanic Dataset)

Manpreet Singh
Feb 25, 2022
1 min read

Manpreet Singh UTA ID:1001873982

Importing the dataset:

The titanic dataset consists of 3 files:

train.csv
test.csv
gender_submission.csv

Lets explore our Dataset:

The train.csv file contains 12 different columns, which gives us all the information about every passenger.

Computing the null or NaN values in the dataset

Calculating total number of survivors:

Survivors with respect to gender:

We further investigate and figure out that almost 3/4th of the female passengers survived the crash.

On further investigation we figure out that people travelling in first class has more probability of survival than the other classes

Completing our dataset, we fill out the ages of NaN with the mean of all the ages i.e 29.

After performing the Random Forest Classifier with the following parameters we saw an increase in the accuracy of our model by .4 percent.

Contribuition:

I explored the data using the matplot.lib and seaborn python library. Plotted graphs to explore more about the data and see hidden patterns and missing values. Used python library to identify if there are any null values present in the dataset for any column or not. Replaced the null value in the column('Age') with the median of the other ages and replaced it in the original dataset. For our Random Forest Classifier i added another parameter to the feature list which is the 'Age' and then changed the hyperparameter of the classifer as well.

Changed the max_depth to 4 from 5. Doing all of this increased the accuracy from 77.511 to 77.99.