This project is a sentiment analysis tool that classifies product reviews as positive or negative. The tool was built using a machine learning model trained on a dataset of tweets, and it is deployed as a web application using Flask.
-
Preprocesses text data by converting to lowercase, removing punctuations, removing stopwords, and stemming.
-
Uses RandomizedSearchCV to find the best model and hyperparameters.
-
Logistic Regression was selected as the best model and is saved as a pickle file.
-
Flask web application allows users to submit product reviews and receive sentiment predictions.
-
Displays the total number of positive and negative reviews.
The dataset used contains tweets labeled as either positive or negative. The dataset is preprocessed to remove noise and prepare it for training the machine learning model.
The preprocessing steps include:
- Converting all text to lowercase.
- Removing all punctuations.
- Removing stopwords using the nltk library.
- Applying stemming to reduce words to their root forms.
The preprocessed data is split into training and test sets. The following models were considered:
- Logistic Regression
- Multinomial Naive Bayes
- Decision Tree Classifier
- Support Vector Classifier (SVC)
RandomizedSearchCV was used to find the best hyperparameters and select the best model. Logistic Regression was determined to be the best performing model and was saved as a pickle file for later use.
The Flask web application allows users to:
- Submit a product review.
- Receive a sentiment analysis prediction (positive or negative).
- View all submitted reviews along with the count of positive and negative reviews.
Contributions are welcome! Please fork this repository and submit a pull request for any improvements or bug fixes.