TGINE_reddit_v1

This program extracts posts and comments data from Reddit's subreddit r/AskScience and performs basic tf/idf vectorization. Four types of extraction are defined as per Reddit browing options:

new
top
rising
hot

Execution

Requires one parameter: limit of type int

Limit is the max number of posts retrieved from reddit.com (max allowed by library is 1000)

Usage: TGINE_reddit_v1.py limit

Only creates each file if it does not exist beforehand. Otherwise reads from existing file.

Output

The output is:

One json file for each mode (new, top, rising, hot) if they didn't exist before execution
Screen output for each file. Order is new, top, rising, hot

TGINE_twitter_v1

This program extracts posts from 4 Twitter users and performs classification and basic sentiment analysis. For classification, an SVM model with different parameters is used, and only the best parameters for the training set are used to test on new data.

Execution

No parameters

Output

The output is:

One json file for each user
Screen output for each dataset (2 users each dataset): parameters, accuracy and confusion matrix (train and test)
Screen output for each user: ratio of positive and negative words

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
TGINE_reddit_v1.py		TGINE_reddit_v1.py
TGINE_twitter_v1.py		TGINE_twitter_v1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TGINE_reddit_v1

Execution

Output

TGINE_twitter_v1

Execution

Output

About

Releases

Packages

Languages

Raquelie/information_retrieval

Folders and files

Latest commit

History

Repository files navigation

TGINE_reddit_v1

Execution

Output

TGINE_twitter_v1

Execution

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages