GitHub - CHARLES-GitHub-2002/DATA-CLEANING-

Reward Program Data Analysis - Data Cleaning Process

Overview

This document outlines the steps I followed to clean the dataset used for analyzing mentor-mentee sessions. The objective was to ensure the dataset was accurate, consistent, and ready for further analysis.

Data Cleaning Steps

Data Exploration

I began by exploring the dataset to understand its structure, identify patterns, and detect any potential issues such as missing values, duplicates, and inconsistent formatting. The exploration helped to flag areas requiring cleaning for effective analysis.
Removing Duplicates

To remove duplicates, I focused on the following key columns:

Mentor_ID
Mentee_Name
Session_Number
Session_Date

Using Excel's built-in functionality, I identified and removed records where all four values matched exactly in another record. In total, 9 duplicate records were removed, ensuring data integrity without redundancies.

3.Handling Missing Values

I identified essential fields whose absence would negatively affect the analysis:

Mentor_ID
Mentee_Name
Session_Date

Records missing any of these essential fields were removed entirely. Non-essential fields with missing values were retained, as their absence does not impact further analysis. Excel's conditional formatting was used to highlight and remove blanks efficiently.

Standardizing the Dataset

I took the following steps to standardize the dataset:

Capitalization: All entries in the Mentor_ID and Mentee_Name columns were standardized by capitalizing names uniformly.
Abbreviations: I checked for abbreviations that might skew further analysis and ensured consistent full names were used.
Date Format: All dates were converted to the standard format YYYY-MM-DD, ensuring consistency across the dataset.

Column Removal

To streamline the dataset, I removed Column A, which contained irrelevant information, focusing on key data points that would directly impact our analysis.

Conclusion

The dataset has been cleaned and standardized, ensuring it is now ready for effective analysis. These steps will help improve the accuracy and reliability of insights derived from this data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Reward_Program(Cleaned_Data).xlsx		Reward_Program(Cleaned_Data).xlsx
Reward_Program_Raw_Data.xlsx		Reward_Program_Raw_Data.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reward Program Data Analysis - Data Cleaning Process

Data Cleaning Steps

Conclusion

About

Releases

Packages

CHARLES-GitHub-2002/DATA-CLEANING-

Folders and files

Latest commit

History

Repository files navigation

Reward Program Data Analysis - Data Cleaning Process

Data Cleaning Steps

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages