Problem Statement
This project investigates how students' performance (test scores) is influenced by variables such as gender, ethnicity, parental education level, lunch, and test preparation courses.
Data Collection
Source: Kaggle - Students Performance in Exams
The dataset contains 8 columns and 1000 rows.
Dataset Information
Gender: Sex of students (Male/Female)
Race/Ethnicity: Group A, B, C, D, or E
Parental Level of Education: Final education of parents (e.g., bachelor's degree, some college)
Lunch: Type of lunch before the test (standard/free or reduced)
Test Preparation Course: Completed or not before the test
Math Score: Math test score
Reading Score: Reading test score
Writing Score: Writing test score
Project Goals
-
Perform exploratory data analysis (EDA) to uncover insights.
-
Examine relationships between test scores and categorical variables.
-
Generate visualizations to represent the findings effectively.
Steps in Analysis
-
Data loading and inspection
-
Data cleaning and preprocessing
-
Data exploration and feature analysis
-
Insights and visualizations
-
Conclusion and recommendations
Dependencies
Python 3.x
Libraries: Pandas, NumPy, Matplotlib, Seaborn
Usage
- Clone the repository:
git clone
- Install the required libraries:
pip install -r requirements.txt
- Run the notebook:
jupyter notebook
- Open 2.0-Student Performance EDA.ipynb and execute the cells sequentially.
Insights and Observations
There are no missing or duplicate values in the dataset.
Average test scores are consistent across subjects, with a mean of approximately 66-69.
Key factors like test preparation and parental education significantly influence performance.