Guhan Kabbina
Harshita Vidapanakal
Hanuraag Baskaran
Rohan M
This repository contains source code for the following projects:
Run the script files present in the config folder.
To Install both Hadoop and Spark on your Linux machine
Run the requirements script files present in the config folder.
To install all the required libraries for all the projects in this repository
The required data files for all the projects is present in the data folder.
The data files are pre-processed and a sample of the data is stored, but the link for the entire dataset is provided in the data\README.md file.
The source code for all the projects is present in the src folder.
PLEASE READ THE DOCUMENTATION AND REPORT TO UNDERSTAND THE WORKING OF THE CODE
Run the respective script files present in the tools folder for each project.
The output for each project is present in the sample folder.
Pre-Trained models for Spam_Ham_Classification are present in the build folder to be used for the classification of the emails using the test src\Spam_Ham\models\model_test.py file.
The peformance analysis of the models in the projects is provided in the report\images folder.
Please raise a Github issue if you have any questions or suggestions.