This project analyses Suicide dataset of Indian states for year 2001-2012. The data has been collected from Kaggle.com. Data science steps followed in this project are Data Pre-processing, Exploratory Data Analysis, Data Modelling and Prediction.
Python Libraries used
- NumPy
- Pandas
- Matplotib
- Seaborn
- OS
Pre-processing operations
- Checking for missing values in dataset
- Remove the records which has values 0-100+ in the age group column which would make the dataset unclear for analysis
- Remove the records where cause type of the suicide is 'Illegitimate Pregnancy' for 'Male'
- Remove the records which doesn't specify any cause for suicide attempt
- Remove the records which doesn't specify any profession of the victim
- Remove the records where profession is 'Housewife' for 'Male'
- The dataset consists of records with age group as 60+, which isn't appropriate. Therefore, those are replaced with random values between 60 and 100
Exploratory Data Analysis
- Number of people who committed suicide between 2001-2012
- Which year had recorded highest number of suicides?
- Which gender tends to commit more suicide?
- In which state do people tend to commit more suicide?
- Top 5 states that recorded highest number of suicides
- State that recorded highest suicide cases due to unemployment
- Distribution of male and female suicides amongst different age groups
- Means adopted to commit suicide
- Major reasons for suicide attempt
- Major reasons where suicide rate of female is more than male
Data Source : Kaggle.com
Tableau was used to plot a heat-map on a geographical map of India. Cleaned data set was taken after pre-processing part and that was used in Tableau.