-
Notifications
You must be signed in to change notification settings - Fork 0
Topic Detection
Subhasis Dutta edited this page Mar 29, 2016
·
1 revision
The Topic detection algorithm helps in categorizing a given text into one of the 22 trained categories mentioned below
Advertising, Beauty, Business, Celebrity, Diy craft, Entertainment, Family, Fashion, Food, General, Health, Lifestyle, Music, News, Pop, Culture, Social, Media, Sports, Technology, Travel, Video Games.
###Packages/Algorithms used
- Word2Vec vectors trained on google news corpus
- Gentsim - To read the binary Word2Vec vectors
- Twokenize - To extract Text and emoticons from Twitter
- RAKE - Keyword extraction
- Scikit-Learn - Kmeans clustering of Word2Vec vectors across the above mentioned categories