Skip to content

anmolg1997/Lead-Scoring

Repository files navigation

Lead Scoring Model

This project builds a logistic regression model that assigns each lead a score from 0–100 reflecting conversion likelihood. Sales teams can use those scores to prioritize outreach on high-probability leads, which raises effective conversion compared to treating all leads equally.

Approach

  1. Data cleaning — Handle missing values, duplicates, and placeholder levels (e.g. categorical “Select”).
  2. EDA — Explore distributions, relationships to conversion, and data quality.
  3. Feature engineering — Encode categoricals, treat outliers, and prepare inputs for modeling.
  4. Logistic regression — Train and evaluate with scikit-learn and statsmodels (including RFE-based feature selection where used).
  5. Lead scoring (0–100) — Map model output to an interpretable score so “hot” leads rank higher than “cold” ones.

Key results

When prioritizing leads by model score, the target conversion rate on the focused segment is about 80%, versus a ~30% baseline conversion rate across all leads—demonstrating stronger sales targeting than uniform outreach.

Dataset

Roughly 9,000 historical leads with attributes such as Lead Source, Total Time Spent on Website, Total Visits, Last Activity, and other marketing and profile fields. The binary target is Converted (1 = converted, 0 = not). Column definitions are documented in Leads Data Dictionary.xlsx (included in the repository).

Tech stack

Python · Pandas · NumPy · scikit-learn · Matplotlib · Seaborn

(Notebooks may also use statsmodels for inference-style logistic regression and feature refinement.)

Getting started

Clone the repository, install dependencies for the notebooks you plan to run (e.g. pandas, numpy, scikit-learn, matplotlib, seaborn, and optionally statsmodels), then start Jupyter:

jupyter notebook

Open a notebook from the repo root and run cells in order. Ensure Leads.csv is in the working directory expected by the notebook paths.

Main files

  • Leads.csv — Lead-level records used for modeling.
  • Leads Data Dictionary.xlsx — Field definitions for the dataset.
  • Lead scoring case study .ipynb / LeadScoringCaseStudy-Ver2.ipynb — Analysis and model notebooks (see below).

Notebooks

File Description
Lead scoring case study .ipynb Earlier walkthrough: loading and cleaning data, profiling-style exploration (e.g. pandas-profiling), and foundational EDA.
LeadScoringCaseStudy-Ver2.ipynb Structured end-to-end pipeline: data cleaning, EDA (uni- and bivariate), dummy variables, scaling, logistic regression with RFE/statsmodels refinement, evaluation (including ROC), and lead scores / hot-lead identification on holdout data.

License

This project is licensed under the MIT License (Copyright © 2021 Anmol Jaiswal).

About

Logistic regression model that assigns lead scores (0-100) to predict conversion likelihood, improving sales targeting from 30% to 80% conversion rate.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors