Feature_Importance_ANN

Aim -

To find key drivers that influence the output of an Artificial Neural Network
To determine the relative importance of these influencing factors

Project Description

The bank in this case wants to predict whether a customer will subscribe to a term deposit. To make this a successful telemarketing campaign, the Bank would like to know which customers are highly likely to subscribe its offer.

The dataset provided by the bank contains details on the number of days since last contact which captures recency aspect and the number of contacts performed during the present and the previous campaign which captures the frequency aspect of the marketing campaign.

For modelling purpose, I have used the recency and frequency metrics to train my model because these metrics have very high predictive power. As for the model, I have used a binary classifier which gives an output of either 1 (the customer will subscribe) or 0 (the customer will not subscribe).

(back to top)

About Data Set

Title: Bank Telemarketing (with social/economic context)
Past Usage: The full dataset (bank-additional-full.csv) was described and analyzed in:

S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems (2014),doi:10.1016/j.dss.2014.03.001.
All records are ordered by date (from May 2008 to November 2010). Detailed Description can be found @ Data_Dictionary.md.

(back to top)

Data Exploration

In this phase of the project, I have tried to resolve common data challenges faced such as poor data quality, multicolinearity, and correlation between pair of variables. The key insights are as follows -

Insight	Visualization
Plot1: Duration of call v/s Subscription Higher is the duration of the last call, higher is the probability that the client will subscribe
Plot2: Histogram of Duration with Subscription Overlay Subscription declines when the duration of call is close to 50 min. However, the outcome is most certainly 'yes' if the duration is close to 65 min. and 'no' when the duration exceeds 65 minutes.
Plot3: Months v/s Subscription Campaigns are most successful in months of - Dec, Mar, Oct, and Sep.
Plot4: Job Type v/s Subscription Surprisingly, students and retired people are more likely to subscribe for a term deposit.
Plot5: Previous Outcome v/s Current Subscription If the outcome of previous campaign was a success then the propensity of that client to subscribe the term deposit is fairly high.
Plot6: Education Level v/s Subscription Illiterate people are more likely to subscribe than educated folk. Also, as the level of education increases the propensity to subscribe increases as well.

From the preliminary data analysis, I concluded that the duration of the last call, outcome of the previous campaign, and month in which the customer was contacted have significant impact on the final outcome. However, there is no way to conclude which one is more or less important relative to each other.

Detailed Description can be found @ Bank_Marketing_Exploratory_Analysis.ipynb.

(back to top)

Modeling Approach

Steps -

Feature Engineering and Data Transformation of Categorical and Numerical Attributes
Split the dataset into predictors and response variables. In this case, response is whether customer subscribes or not.
Split the dataset into training set (75%) and test set (25%)
Create Model
Fine tune the Hyper-parameters
Test Performance of the Final Model
Report Performance Metrics
Identify most important factors based on socioeconomic characteristics of the customers

(back to top)

Model Performance

Overall, our model has achieved an accuracy of 91.66% for the test set.

The confusion matrix for this classification model is shown below.

Confusion Matrix

Performance Metrics

Number of False Positives, FP = 438
Number of False Negatives, FN = 392
Number of True Positives, TP = 664
Number of True Negatives, TN = 8457

False Positive Rate (FPR) = FP / (FP + TN) = 0.0492
False Negative Rate (Type2 Error)= FN / (FN + TP) = 0.3712
True Positive Rate (TPR) = TP / (TP + FN) = 0.6287
True Negative Rate (Type1 Error) = TN / (TN + FP) = 0.9507
Accuracy = (TP + TN) / (TP + TN + FP + FN) = 0.9165

(back to top)

Feature Importance

SHAP Values

Explanation

The horizontal bar plot shows the average impact of
a feature on model output.

Here, duration of the last contact, number of
employees, and whether the last contact month
of the year was May or not contribute the most in estimating customer subscribription rate for a term deposit.

SHAP values measure feature importance at row
level. It represents how a feature influences the
prediction of a single row relative to the other
features in that row and to the average outcome
in the dataset. Features are ranked in the
diminishing order of influence. The size of the bar
plot shows the magnitude of the influence that
feature has on the final outcome.

(back to top)

Summary

Optimal Set of Hyperparameters for our Neural Network is given by -
- neurons = 31
- learning_rate = 0.1
- batch_size = 2048
- optimizer = adam
- epochs = 100
- Accuracy = 91.65%
Precision for (y=0) = 95%
Precision for (y=1) = 63%
Duration of last contact is the most influencing factor in determining whether a customer will subscribe to a term deposit.

(back to top)

Author

@Abbas S.

License

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

(back to top)

Acknowledgments

Inspiration, code snippets, etc.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
Data Set Description		Data Set Description
Data		Data
images		images
.gitignore		.gitignore
Bank_Marketing_Exploratory_Analysis.ipynb		Bank_Marketing_Exploratory_Analysis.ipynb
Classification_ANN.html		Classification_ANN.html
Classification_ANN.ipynb		Classification_ANN.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature_Importance_ANN

Table of Contents

Project Description

About Data Set

Data Exploration

Modeling Approach

Model Performance

Feature Importance

Summary

Author

License

Acknowledgments

About

Releases

Packages

Languages

License

clkride/Feature_Importance_ANN

Folders and files

Latest commit

History

Repository files navigation

Feature_Importance_ANN

Table of Contents

Project Description

About Data Set

Data Exploration

Modeling Approach

Model Performance

Feature Importance

Summary

Author

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages