Lead Scoring Model

This project builds a logistic regression model that assigns each lead a score from 0–100 reflecting conversion likelihood. Sales teams can use those scores to prioritize outreach on high-probability leads, which raises effective conversion compared to treating all leads equally.

Approach

Data cleaning — Handle missing values, duplicates, and placeholder levels (e.g. categorical “Select”).
EDA — Explore distributions, relationships to conversion, and data quality.
Feature engineering — Encode categoricals, treat outliers, and prepare inputs for modeling.
Logistic regression — Train and evaluate with scikit-learn and statsmodels (including RFE-based feature selection where used).
Lead scoring (0–100) — Map model output to an interpretable score so “hot” leads rank higher than “cold” ones.

Key results

When prioritizing leads by model score, the target conversion rate on the focused segment is about 80%, versus a ~30% baseline conversion rate across all leads—demonstrating stronger sales targeting than uniform outreach.

Dataset

Roughly 9,000 historical leads with attributes such as Lead Source, Total Time Spent on Website, Total Visits, Last Activity, and other marketing and profile fields. The binary target is Converted (1 = converted, 0 = not). Column definitions are documented in Leads Data Dictionary.xlsx (included in the repository).

Tech stack

Python · Pandas · NumPy · scikit-learn · Matplotlib · Seaborn

(Notebooks may also use statsmodels for inference-style logistic regression and feature refinement.)

Getting started

Clone the repository, install dependencies for the notebooks you plan to run (e.g. pandas, numpy, scikit-learn, matplotlib, seaborn, and optionally statsmodels), then start Jupyter:

jupyter notebook

Open a notebook from the repo root and run cells in order. Ensure Leads.csv is in the working directory expected by the notebook paths.

Main files

Leads.csv — Lead-level records used for modeling.
Leads Data Dictionary.xlsx — Field definitions for the dataset.
Lead scoring case study .ipynb / LeadScoringCaseStudy-Ver2.ipynb — Analysis and model notebooks (see below).

Notebooks

File	Description
`Lead scoring case study .ipynb`	Earlier walkthrough: loading and cleaning data, profiling-style exploration (e.g. pandas-profiling), and foundational EDA.
`LeadScoringCaseStudy-Ver2.ipynb`	Structured end-to-end pipeline: data cleaning, EDA (uni- and bivariate), dummy variables, scaling, logistic regression with RFE/statsmodels refinement, evaluation (including ROC), and lead scores / hot-lead identification on holdout data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
Lead scoring case study .ipynb		Lead scoring case study .ipynb
LeadScoringCaseStudy-Ver2.ipynb		LeadScoringCaseStudy-Ver2.ipynb
Leads Data Dictionary.xlsx		Leads Data Dictionary.xlsx
Leads.csv		Leads.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lead Scoring Model

Approach

Key results

Dataset

Tech stack

Getting started

Main files

Notebooks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lead Scoring Model

Approach

Key results

Dataset

Tech stack

Getting started

Main files

Notebooks

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages