Skip to content

sudam802/NLP_tutorials

Repository files navigation

NLP_tutorials

Natural language processing tutorials

The provided code is a machine learning pipeline to classify disaster-related tweets. It begins by importing necessary libraries and loading the training and test datasets. The tweets are preprocessed by removing articles ("a", "an", "the") using a regular expression. A CountVectorizer is then used to convert the text data into a matrix of token counts. The RidgeClassifier model is trained on this transformed data, and its performance is evaluated using 3-fold cross-validation with the F1 scoring metric. Finally, the model is used to make predictions on the test dataset, and an example tweet is printed to demonstrate preprocessing.

About

Natural language processing tutorials

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published