This project was developed for the 1st Alura Data Science Challenge.
The purpose of this project is to develop a classification model for churn prediction for a fictitious company called Alura Voz, in order to help them reduce thier churn rate.
To start, let's clarify what Churn is. The term refers to the rate of customer or user loss over a certain period of time. It is a crucial indicator for companies in various sectors as it directly impacts revenue and business health. Therefore, when we refer to Churn in our dataset, we will be discussing whether a customer continues to make purchases with the company or not.
During this workshop, we will follow the CRISP-DM method, a widely used approach in Data Science projects, which consists of: business understanding, data understanding, data preparation, modeling, evaluation, and implementation. Here, we will cover as much as possible of the concepts up to the evaluation phase.
Along the way, we will address data analysis techniques, preprocessing, feature selection, and the choice and evaluation of the performance of Machine Learning models.
Let's go! 😎
- Exploratory Data Analysis
- Data Visualization
- Data preprocessing
- Data balancing with over-sampling
- Machine Learning
- Classification models (Logistic Regression, Random Forest Classifier and Gradient Boosting)
- Model Evaluation
- Python
- Pandas, Numpy
- Matplotlib, Seaborn, Plotly
- Scikit-Learn
- Imblearn
- Jupyter Notebook
This project aims to develop a classification model to predict customer churn. For that, we are using data about the company's customers to train our models and make predictions. These data contains informations such as:
customerID
: unique identifier number of customerChurn
: if the client has left or not the companygender
: gender (male or female)SeniorCitizen
: if the client is older than 65 years oldPartner
: if the client has a partner or notDependents
: if the client has dependants or nottenure
: contract months subcriptionPhoneService
: telephone service subcriptionMultipleLines
: more than one phone line subcriptionInternetService
: internet provider subcriptionOnlineSecurity
: aditional subcription for online securityOnlineBackup
: aditional subcription for online backupDeviceProtection
: aditional subcription for device protectionTechSupport
: aditional subcription for technial supportStreamingTV
: cable subscriptionStreamingMovies
: streaming subscriptionContract
: type of contractPaperlessBilling
: if the client prefers to receive the bill onlinePaymentMethod
: payment methodCharges.Monthly
: client's total services of the monthCharges.Total
: client's total spents
In order to do that, we are using an over-sampling technique for handling imbalanced classes and traning classification models such as:
- Logistic Regression
- Random Forests
- Gradient Boosting.
- data exploration/descriptive statistics
- data processing/cleaning
- statistical modeling
- model evaluation and optimization
#alurachallengedatascience1