Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
yarenozyer authored Apr 21, 2024
1 parent 5649052 commit 75b3079
Show file tree
Hide file tree
Showing 19 changed files with 10,920 additions and 0 deletions.
1 change: 1 addition & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
OPENAI_API_KEY="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"
396 changes: 396 additions & 0 deletions atheism_translated.txt

Large diffs are not rendered by default.

397 changes: 397 additions & 0 deletions atheism_without_none.csv

Large diffs are not rendered by default.

406 changes: 406 additions & 0 deletions atheism_without_none.txt

Large diffs are not rendered by default.

227 changes: 227 additions & 0 deletions climate_translated.txt

Large diffs are not rendered by default.

228 changes: 228 additions & 0 deletions climate_without_none.csv

Large diffs are not rendered by default.

539 changes: 539 additions & 0 deletions feminism_without_none.csv

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions output.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Tweet
"@tedcruz And, #HandOverTheServer she wiped clean + 30k deleted emails, explains dereliction of duty/lies re #Benghazi,etc #tcot"
Hillary is our best choice if we truly want to continue being a progressive nation. #Ohio
"@TheView I think our country is ready for a female pres, it can't ever be Hillary"
I just gave an unhealthy amount of my hard-earned money away to the big gov't & untrustworthy IRS. #WhyImNotVotingForHillary
@PortiaABoulger Thank you for adding me to your list
Hillary can not win. Here's hoping the Dems offer a real candidate like Warren. #Warren2016
"Respect FOR the law and respect BY the law Yes, needed desperately. #BaltimoreRiots"
I don't want to be appointed to an Ambassador post.
"#StopHillary2016 @HillaryClinton if there was a woman with integrity and honesty I would vote for such as woman president, NO"
@HillaryClinton End lawless #ClintonFoundation. Jail Butcher of #Benghazi. #Arrest rapist #BillClinton. #HillaryClinton
"Use your brain, keep Hillary out of the White House.Clinton2016"
@HillaryClinton Hillary pandering with her logo. #ClintonFoundationscandal #ClintonCash
"@readyforHRC @HillaryClinton #HillaryClinton, the US presidency is a testament to the success of #women their role in the world"
@CiaraAntaya cuz you know I'm such a feminist
2 million bogus followers on Twitter @HillaryClinton #WhyImNotVotingForHillary
@lindasuhler : My name is Rebecca and my grandmother immigrated to Sunnybrook Farm. @twitchyteam
Where's the campaign store is the real question? I am ready to buy some Hillary gear
"It's a miracle, suddenly #Democrats don't mind having someone who voted for war."
@smileitsalicia @greekgummybear2 now i can live in peace
Hillary doesn't want to put anyone in prison anymore. Obviously worried about her own future.
The only way I support Hillary was if Elizabeth Warren ran or Karl Marx was running #2016 #Clinton2016
@HomeOfUncleSam @ScotsFyre @RWNutjob1 @SA_Hartdegen She's too old to understand the internet...that she can be fact checked.
Because Communist Breadlines are not my thing! #NoHillary #WhyImNotVotingForHillary
"@HillaryClinton bad wife, bad role model for women, bad lawyer, bad First Lady, bad Senator, horrible Secretary of State."
"Everything Hillary touches ends up being a scam, a lie, a cover-up, or a failure. William L. Just who we want as president."
Yes HRC subject 2 dbl standard Smh Come on @billclinton @HillaryClinton U Knew @ClintonFdn Donations Would b Scrutinized; Spun!
#Hillary to stop for #pizza today to garner the #Italian vote. #MSM is worthless. #libertynothillary #HillNo
I want America to great again #WhyImNotVotingForHillary
"March 8, 2016 Ohio is holding our Primaries! The date is subject to change. #Ohio #OurChampion"
@RIGHTZONE @WethePeoplePets Let's hope the VOTERS remember! #HillNo
Hillary Clinton has not driven a car since 1996. #clintonfakerealityshow
"@NaughtyBeyotch @TheRealMadman23 Don't care for #Fiorina, but it seems she taking the #sexist gut punches by the #media. #MSM"
"@FoxNews @marthamaccallum @BillHemmer whose the opportunist now, #NoHillary2016"
@HillaryClinton @WomenintheWorld we need to re-establish a #global system dominated by love and affection have #moral_humane RT
#Hillary is as transparent as a brick wall #LibertyNotHillary
@WSJ . Clinton Foundation to keep accepting bribes from foreign governments #WhyImNotVotingForHillary
"@josephbenning I agree, these are better than what you had before, like a severe cold is better than pneumonia. Good luck."
"What are you afraid of @HillaryClinton? If you can't answer questions from the press, why do we want you as POTUS"
"Sorry, Hillary's new normal folk image doesn't take away from Behgnazi & her 0 foreign policy successes as Secretary of State."
CEO pay the target for 2016 election. From someone who makes more than most CEO's but you drank the Kool-Aid.
233 changes: 233 additions & 0 deletions output.txt

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions parse_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import pandas as pd


def write_tweets_to_txt(file):
df = pd.read_csv(file, encoding='ISO-8859-1')
column_data = df["Tweet"]
row_count = 0

with open('atheism_without_none.txt', 'w', encoding='ISO-8859-1') as f:
for value in column_data:
if(row_count % 43 == 0):
f.write('********************************************' + '\n')
f.write(str(value) + '\n')
row_count += 1

write_tweets_to_txt('atheism_without_none.csv')
1 change: 1 addition & 0 deletions test.csv

Large diffs are not rendered by default.

61 changes: 61 additions & 0 deletions testSVM.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import numpy as np
from nltk.tokenize import word_tokenize
from snowballstemmer import TurkishStemmer
import string
from sklearn.feature_extraction.text import TfidfVectorizer

import pandas as pd

#stemmer = TurkishStemmer()

def read_turkish_tweets(file):
df = pd.read_csv(file, encoding='windows-1254')
tweets = df["Tweet"].tolist()
targets = df["Target"].tolist()
return tweets, targets

def detect_stopwords():
stopwords_df = pd.read_csv('turkish', header=None)
stop_words = stopwords_df[0].tolist()
#stop_words = stopwords.words('turkish')
stop_words.extend(string.punctuation)
stop_words.extend(["vs.", "vb.", "a", "i", "e", "rt", "#semst", "semst"])
stop_words = set(stop_words)
return stop_words


def tokenize_tweet(tweet):
# Tokenization
tokens = word_tokenize(tweet)
stop_words = detect_stopwords()
normalized_tokens = [token.lower() for token in tokens]
filtered_tokens = [token for token in normalized_tokens if (token not in stop_words and not token.startswith("http"))]
#stemmed_tokens = [stemmer.stemWord(token) for token in filtered_tokens]

return filtered_tokens

def extract_features_tfidf(tweets):
# Tokenization and preprocessing
tokenized_tweets = [tokenize_tweet(tweet) for tweet in tweets]

# Feature extraction: n-grams
tfidf_vectorizer = TfidfVectorizer(ngram_range=(1, 3), analyzer='word')
tfidf_features = tfidf_vectorizer.fit_transform([' '.join(tokens) for tokens in tokenized_tweets])

# Feature extraction: character n-grams
char_tfidf_vectorizer = TfidfVectorizer(ngram_range=(2, 5), analyzer='char')
char_tfidf_features = char_tfidf_vectorizer.fit_transform([' '.join(tokens) for tokens in tokenized_tweets])

# Feature extraction: sentiment lexicon features, target presence/absence, POS tags, encodings
# These features remain the same as before

# Combine all features
all_features = np.concatenate((tfidf_features.toarray(), char_tfidf_features.toarray()), axis=1)
np.savetxt('feature_matrix.csv', all_features, delimiter=',')
return all_features

train_tweets, train_targets = read_turkish_tweets('translated_train_without_none.csv')

print(extract_features_tfidf(train_tweets))

#print(tokenize_tweet(train_tweets[0]))
Loading

0 comments on commit 75b3079

Please sign in to comment.