This analysis investigates a dataset of 7,043 telecom customers to understand the key drivers behind customer churn. Beyond standard exploratory analysis, this project focuses on identifying high-risk customer segments through advanced feature engineering and uncovering "hidden" data quality issues that often break predictive models.
A common pitfall in this dataset is the TotalCharges column. While it appears to be numeric, it is loaded as an object type because it contains empty strings (" ") for customers with 0 tenure.
- The Fix: I identified 11 rows with this issue, coerced the errors to
NaN, and logically imputed them with0(since 0 tenure implies 0 total charge), rather than dropping valuable data.
Tenure_Group: Converted continuous months into lifecycle stages (New, 1-2 Years, Loyal).Total_Services: An aggregation score (0-6) counting how many add-on services (TechSupport, Streaming, etc.) a user subscribes to.Customer_Segment(RFM Adaptation): I created a custom segmentation matrix combining Loyalty (Tenure) and Value (Monthly Charges) to name specific user groups.
- The "New Whale" Problem: High-spending new customers have a 75% Churn Rate. These are users buying premium plans but leaving almost immediately.
- Actionable Advice: The onboarding process for premium users is failing; an immediate retention offer is needed in Month 1.
- The Price of Loyalty: Churn is highly correlated with price unless the customer has been retained for 2+ years.
- Service "Lock-in": Customers with 3+ additional services (like Security + Backup) are 40% less likely to churn.