Content-Categerization-Library

The Pipeline uses different libraries and techniques manually implemented in python to handle multi-class classification problems which promises higher accuracy than the native TF-IDF feature extraction pipeline. The pipeline can be used to get baseline scores for text categorisation.

Algorithms used for Pipeline:

Filtering --

Zipf Law and Chi-square filters

Feature Extraction --

TF-IDF, Bag Of Words, GLOVE vectors , Novel document vector aggregation using GloVe vectors

Modelling --

Tree based algorithms -->

Light GBM, XGboost, Random Forest

Linear Algorithms -->

Logistic, SGD classifier

The pipeline Outputs a classification score report of all individual classes with corresponding weighted Precision and Recall.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Content-Categerization-Library		Content-Categerization-Library
README.md		README.md
chi_sqaure_filter.py		chi_sqaure_filter.py
feature_extraction.py		feature_extraction.py
main.py		main.py
models_mod_.py		models_mod_.py
preprocess_.py		preprocess_.py
zipf_.py		zipf_.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content-Categerization-Library

Algorithms used for Pipeline:

Filtering --

Feature Extraction --

Modelling --

Tree based algorithms -->

Linear Algorithms -->

About

Releases

Packages

Languages

pranay5255/Content-Categerization-Library

Folders and files

Latest commit

History

Repository files navigation

Content-Categerization-Library

Algorithms used for Pipeline:

Filtering --

Feature Extraction --

Modelling --

Tree based algorithms -->

Linear Algorithms -->

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages