-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 2dda6f6
Showing
12 changed files
with
17,549 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# 3daffect | ||
### Creation and validation of a 3-dimensional sentiment dictionary | ||
|
||
A weighted 3-dimensional sentiment dictionary for quantifying the affect in text samples. The dictionary covers approximately 2 million tokens in the pre-trained (common crawl) fastText word vector embedding. These word vectors (not included in repository due to size, but available [here](https://fasttext.cc/docs/en/english-vectors.html)) are necessary for re-running some elements of the | ||
code included in this repository. | ||
|
||
A radial basis function support vector regression was trained to predict ratings of 166 mental state words on 3 principal component dimensions (rationality vs. emotionality, social impact, and valence [+/-]) based on the 300d fastText embedding. This regression achieved relatively high accuracy in 5-fold cross-validation: r = .86, .85, and .91, respectively; RMSE = .60, .60,, .51, respectively, vs. chance at SD=1. An SVM-R trained on all 166 state words was then used to impute 3-d affect scores to all words in the fastText corpus, creating a weighted dictionary. | ||
|
||
The resulting weighted dictionary was validated in two ways. First, dictionary scores of individual words on the 3 dimensions were correlated with approximately matched human ratings of dominance, arousal, and valence across nearly 14k words normed in [Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207.](http://crr.ugent.be/archives/1003). Resulting correlations were .57 for dominance-rationality, .27 for arousal-social impact, and .67 for valence-valence. This reliable out-of-sample prediction, particularly for the exact dimension match of valence to valence, suggests that the dictionary creation method was largely successful. | ||
|
||
Second, the 3d affect dictionary was compared with the 14k ratings from Warriner et al. in terms of scoring extended pieces of text. These consisted of sentences from Amazon reviews, entire IMdB reviews, and sentences from Yelp reviews, curated as part of Group to [Individual Labels using Deep Features, Kotzias et. al,. KDD 2015](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences). Each sentences/review was labeled with a binary 1/0 for positive vs. negative, which we attempted to predict using the valence dimensions of both dictionaries. Results favored the 3d affect dictionary over the 14k human rated words for two of the three validation sets despite the far smaller set of words originally normed in the affect dictionary: Amazon - 60% vs 69% accuracy; IMdB - 73% accuracy vs. 70% accuracy; and Yelp - 73% accuracy vs. 69% accuracy. This superior performance was achieved in part - though not completely - due to the fact that the 3d affect model was able to score every piece of text due to its large number of tokens. | ||
|
||
Please cite the paper which originally derived the 3-dimensional model of affect from analysis of patterns of brain activity associated with mental state representation [Tamir, D. I., Thornton, M. A., Contreras, J. M., & Mitchell, J. P. (2016). Neural evidence that three dimensions organize mental state representation: Rationality, social impact, and valence. Proceedings of the National Academy of Sciences of the United States of America, 113(1), 194-199.](http://markallenthornton.com/cv/TamirThornton_PNAS_2016.pdf) | ||
|
||
This dictionary was built as part of [Methods in Neuroscience at Dartmouth (MIND)](https://summer-mind.github.io/index.html), 2018. |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
|
||
|
||
# load packages ----------------------------------------------------------- | ||
|
||
if(!require(e1071)) install.packages("e1071"); require(e1071) | ||
library(data.table) | ||
|
||
# define functions -------------------------------------------------------- | ||
|
||
rmse <- function(x,y){ | ||
return(sqrt(mean((x-y)^2))) | ||
} | ||
|
||
# load (small) data ------------------------------------------------------- | ||
|
||
# pricipal components scores | ||
pcs <- read.csv("pc166.csv") | ||
states <- as.character(pcs$states) | ||
pcs <- pcs[,2:4] | ||
pcs[,1] <- -pcs[,1] | ||
rownames(pcs)<-states | ||
#pcs <- apply(pcs,2,rs01) | ||
|
||
# vectors for states | ||
svecs <- read.csv("state_vectors.csv",header=F) | ||
rownames(svecs)<-svecs$V1 | ||
svecs <- svecs[,2:301] | ||
|
||
|
||
# test method on state words ---------------------------------------------- | ||
|
||
# svm-regression version | ||
set.seed(1) | ||
cvinds <- sample(rep(1:5,34)[1:166]) | ||
cperf <- matrix(NA,5,3) | ||
eperf <- matrix(NA,5,3) | ||
for (i in 1:5){ | ||
tsel <- cvinds == i | ||
pcs.train <- pcs[!tsel,] | ||
svecs.train <- svecs[!tsel,] | ||
pcs.test <- pcs[tsel,] | ||
svecs.test <- svecs[tsel,] | ||
for (j in 1:3){ | ||
y <- pcs.train[,j] | ||
x <- as.matrix(svecs.train) | ||
fit <- svm(y~x,kernel="radial",cost = 4) | ||
x <- as.matrix(svecs.test) | ||
preds <- predict(fit,x) | ||
cperf[i,j]<-cor(preds,pcs.test[,j]) | ||
eperf[i,j]<-rmse(preds,pcs.test[,j]) | ||
} | ||
} | ||
colMeans(cperf) | ||
colMeans(eperf) | ||
|
||
# fit full model | ||
svmlist <- list() | ||
for (i in 1:3){ | ||
y <- pcs[,i] | ||
x <- as.matrix(svecs) | ||
svmlist[[i]]<-svm(y~x,kernel="radial",cost = 4) | ||
} | ||
|
||
|
||
# cycle through vectors --------------------------------------------------- | ||
|
||
#fast <- fread("./crawl-300d-2M.vec/crawl-300d-2M.vec",skip = 1) | ||
#save(fast,file="fast.Rdata") | ||
|
||
load("fast.Rdata") | ||
fast <- as.data.frame(fast) | ||
|
||
|
||
|
||
tokens <- as.character(fast$V1) | ||
nvec <- dim(fast)[1] | ||
dict <- matrix(NA,nvec,3) | ||
start <- proc.time() | ||
for (v in 1:nvec) { | ||
x <- matrix(as.numeric(fast[v,2:301]),1,300) | ||
for (i in 1:3){ | ||
dict[v,i] <- predict(svmlist[[i]],x) | ||
} | ||
if ((v %% 1000)==0){ | ||
print(v) | ||
print(proc.time()-start) | ||
} | ||
} | ||
|
||
save(tokens,dict,file = "3daffect.Rdata") | ||
|
||
rownames(dict)<-tokens | ||
colnames(dict)<-gsub("\\."," ",colnames(pcs)) | ||
write.csv(dict,"3daffect_dict.csv") | ||
|
||
# 2 million tokens with weighted valence vs: | ||
# 407 positive emotion and 501 negative emotion words in LIWC | ||
# 1747 positive and 4086 negative emotion words in qdap | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# -*- coding: utf-8 -*- | ||
""" | ||
extract_ms_vec.py | ||
Created on Fri Jul 13 13:45:15 2018 | ||
@author: mthornton | ||
""" | ||
|
||
|
||
import io | ||
import csv | ||
|
||
with open('pc166.csv', 'rb') as dfile: | ||
reader = csv.reader(dfile) | ||
header = reader.next() | ||
states = [] | ||
for row in reader: | ||
states.append(row[0]) | ||
|
||
|
||
def load_vectors(fname, whitelist): | ||
fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore') | ||
n, d = map(int, fin.readline().split()) | ||
data = {} | ||
for line in fin: | ||
tokens = line.rstrip().split(' ') | ||
if (tokens[0] in whitelist): | ||
data[tokens[0]] = map(float, tokens[1:]) | ||
fin.close() | ||
return data | ||
|
||
data = load_vectors('./crawl-300d-2M.vec/crawl-300d-2M.vec', states) | ||
with open('state_vectors.csv', 'wb') as dfile: | ||
writer = csv.writer(dfile) | ||
for s in states: | ||
writer.writerow([s] + data[s]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
states,rationality,social impact,valence | ||
admiration,0.025735943,0.934132467,1.470302876 | ||
affection,0.767795367,1.170277563,1.908797332 | ||
agitation,0.462970394,0.687185098,-0.874492512 | ||
alarm,-0.246642861,0.975805975,-0.990128624 | ||
alertness,-1.280707584,0.942608723,0.100091573 | ||
amazement,0.930492177,0.858250183,1.124844325 | ||
ambivalence,-0.471409341,-0.915971314,0.161737969 | ||
amusement,0.853737842,0.704328792,1.433350513 | ||
anger,0.782410082,1.367393318,-0.999438824 | ||
annoyance,0.582473308,0.765227922,-0.890200227 | ||
anticipation,-0.804949979,0.957129685,0.213377146 | ||
anxiety,0.623693035,0.56697936,-0.961400457 | ||
apathy,0.33371892,-1.169374348,0.07021582 | ||
appreciation,0.036759909,0.621946941,1.513059635 | ||
apprehension,-0.223149786,0.149008738,-0.46139021 | ||
attention,-2.071314946,1.194434957,0.395945101 | ||
awareness,-1.7331338,0.760633882,0.669893114 | ||
awe,0.958188864,0.285491368,1.068144643 | ||
belief,-1.164222181,-0.055190077,0.733197076 | ||
bewilderment,0.292159607,-0.117887844,-0.408392165 | ||
bias,-1.291705368,0.537925576,-0.783441722 | ||
bitterness,0.902736365,0.054791158,-0.919873541 | ||
boredom,0.208353031,-1.6741052,-0.498166666 | ||
calmness,0.507993858,-1.730711581,2.096434335 | ||
certainty,-1.999659272,-0.237838334,0.626624182 | ||
cheerfulness,1.039182597,1.040732284,1.565231309 | ||
cognition,-2.60731945,-0.262765207,0.280897804 | ||
concern,-0.230170693,0.677796353,0.604795361 | ||
confusion,-0.147566988,-0.19840759,-0.651234408 | ||
consciousness,-1.425032411,-0.279013366,0.689648498 | ||
contemplation,-1.655585131,-1.450600284,0.586508288 | ||
contempt,0.161739517,0.527089791,-0.652786184 | ||
contentment,0.507450773,-0.693323948,1.519139441 | ||
craziness,0.099959294,0.742865406,-1.037267544 | ||
curiosity,-0.728232578,0.708204937,0.891215141 | ||
decision,-2.912353287,0.463439667,0.290418857 | ||
delight,1.112818673,0.7322698,1.66872562 | ||
depression,1.134585171,-2.001262803,-0.491519168 | ||
derangement,-0.184346664,-0.513263745,-1.230575853 | ||
desire,0.488863009,1.380981444,0.988605747 | ||
despair,1.100415892,-0.730222971,-0.786994364 | ||
disappointment,0.806179654,-0.742747272,-0.726481688 | ||
disarray,-0.584116549,-0.332136599,-1.247572119 | ||
disbelief,-0.256645308,-0.491204232,-0.621827801 | ||
disgust,0.858473605,0.311946472,-1.073709635 | ||
distress,0.712297686,0.47941285,-1.109223142 | ||
distrust,-0.21655778,0.649425742,-0.767877049 | ||
dominance,-1.759922272,1.928856793,-0.412518607 | ||
doubt,-0.429298397,-0.705583079,-0.769908968 | ||
dread,0.822571661,-0.137657165,-1.02268029 | ||
dreaminess,0.735103015,-1.627253626,1.232698315 | ||
drowsiness,0.177340652,-2.274985871,-0.252875755 | ||
drunkenness,-0.417283191,-0.27763094,-0.863196523 | ||
earnestness,-0.404476667,0.112791241,1.053073347 | ||
ecstasy,1.126134307,0.800604027,1.045231939 | ||
elation,1.073494673,0.58843831,1.409527858 | ||
embarrassment,1.031733956,0.93759749,-0.91711799 | ||
emotion,1.210577778,0.401021047,0.970995376 | ||
empathy,0.723784498,0.3845917,1.661972548 | ||
enjoyment,0.935932691,0.795059155,1.777295143 | ||
enthusiasm,0.428051508,1.067270888,1.428542338 | ||
envy,0.683901194,1.308299495,-0.864815885 | ||
exaltation,0.263656941,0.147047278,0.593544531 | ||
exasperation,0.303592025,0.253141476,-0.773409123 | ||
excitement,0.955982195,1.188626036,1.536104261 | ||
exhaustion,-0.011743445,-1.487878285,-0.48808895 | ||
expectation,-1.569158964,0.610184619,0.203231102 | ||
fascination,-0.037208454,0.757566307,1.088721773 | ||
fatigue,-0.050033009,-2.088650741,-0.445218137 | ||
fear,0.636304835,0.977200688,-1.149661429 | ||
feeling,1.162671176,-0.032149668,1.098617284 | ||
frenzy,-0.090129697,1.243834991,-0.851518066 | ||
friendliness,0.272890394,1.285582603,2.056562312 | ||
frustration,0.493072453,0.693277327,-0.936062849 | ||
fury,0.892291071,0.979068273,-0.91685364 | ||
gloominess,1.327704318,-1.997903844,-0.369805244 | ||
guilt,0.820951937,-0.033434889,-1.039157727 | ||
hallucination,-0.1566409,-0.664213273,-1.03754451 | ||
happiness,1.277805118,0.928183019,1.717882897 | ||
hate,0.764051526,1.199128006,-1.054020357 | ||
hope,0.590053519,0.163786244,1.524199869 | ||
horror,0.954998966,0.603295938,-1.430461528 | ||
humiliation,0.705749249,1.014438552,-1.100852745 | ||
humor,0.250919771,1.028003588,1.305810003 | ||
hunger,-0.672408113,-0.162040541,-1.102407745 | ||
hypnosis,-0.326613767,-1.473702971,-0.149335716 | ||
hysteria,0.446137404,0.883212903,-1.317924735 | ||
imagination,-0.700668729,-0.251757854,0.909582201 | ||
impatience,-0.027108914,0.70817421,-0.976842167 | ||
indecisiveness,-1.014938031,-0.6175013,-0.8284021 | ||
indifference,-0.222811792,-1.106584471,-0.263220581 | ||
insanity,-0.018410371,0.30734046,-1.375482973 | ||
inspiration,-0.184287013,0.462821406,1.083639321 | ||
intention,-2.360469782,0.071133008,0.169851097 | ||
interconnectedness,-0.724569294,0.521660848,0.997159918 | ||
interest,-1.174122294,0.851577804,0.968739327 | ||
intrigue,-0.721951986,0.589300222,0.772841434 | ||
irritation,0.789273414,0.387069228,-0.891181025 | ||
jealousy,0.724690361,1.493598668,-0.937239668 | ||
judgment,-2.051066083,0.975138778,-0.611428555 | ||
laziness,-0.318537991,-2.018456689,-0.493568802 | ||
lethargy,-0.133967929,-2.146007407,-0.637809597 | ||
loneliness,1.190104587,-1.343440762,-0.423789104 | ||
lust,0.593161585,1.87037369,-0.064433537 | ||
melancholy,1.065616809,-1.621275676,-0.30645836 | ||
memory,-1.678326693,-0.379380457,0.295704699 | ||
misery,1.201442192,-1.005439684,-0.615879251 | ||
mortification,0.525026445,0.594380276,-1.415757602 | ||
nervousness,0.523544216,0.536682638,-0.840736609 | ||
objectivity,-2.434491368,-0.486333397,-0.097467945 | ||
opinion,-2.240860378,0.444946614,-0.056029523 | ||
optimism,0.182095073,0.254426998,1.239824201 | ||
outrage,0.624975418,1.366724637,-1.208603422 | ||
pain,0.311639229,0.128627509,-1.305662996 | ||
panic,0.472117106,1.141795258,-1.253970669 | ||
patience,-0.457746501,-1.015309284,1.559286726 | ||
peacefulness,0.917163542,-1.558029508,2.095240772 | ||
pensiveness,-0.856712666,-1.671660446,0.388304374 | ||
pity,0.878068344,-0.312653053,-0.11252334 | ||
planning,-3.040647578,0.30379917,0.097109337 | ||
playfulness,0.300856326,1.388022582,1.517898961 | ||
pleasure,1.033172002,0.963773989,1.666223477 | ||
prejudice,-0.719327751,0.977182844,-1.019037088 | ||
preoccupation,-1.176342318,-0.373516086,-0.462258258 | ||
pride,0.205683604,0.852948563,0.604321435 | ||
rage,0.882214652,1.007879331,-1.119140046 | ||
reason,-2.588647603,-0.445249356,0.418709918 | ||
regret,0.609623306,-0.444317374,-0.960113841 | ||
relaxation,0.380209871,-1.696477446,2.062468813 | ||
relief,0.650330124,-0.948259161,1.612633886 | ||
remorse,0.945827391,-0.644603885,-0.410454495 | ||
resentment,0.498443142,0.513393639,-0.857308056 | ||
sadness,1.376014495,-1.261007903,-0.376957422 | ||
satisfaction,0.081217726,-0.109882404,1.595398554 | ||
self-consciousness,-0.670412183,-0.05252964,-0.232127006 | ||
self-control,-2.253707557,-0.567294539,0.673667065 | ||
self-pity,0.625739869,-1.467834491,-0.702620504 | ||
serenity,0.725609505,-1.666420357,1.868379989 | ||
seriousness,-0.955755512,-0.429136604,0.082920376 | ||
shame,0.988499388,0.190767077,-0.877431917 | ||
shock,0.520907678,0.848194883,-0.945998101 | ||
skepticism,-1.47700388,-0.280978123,-0.495071272 | ||
sleepiness,0.156886642,-2.334579375,0.105937619 | ||
sorrow,1.28782441,-1.132810494,-0.484308968 | ||
stress,-0.12450088,0.716216123,-1.180044023 | ||
stupor,-0.195512437,-1.323397174,-0.713699257 | ||
subordination,-1.544128211,0.16431237,-0.709366687 | ||
surprise,0.709284281,1.272726663,0.540075561 | ||
suspicion,-0.651074179,0.835240268,-0.825442141 | ||
sympathy,1.090717155,0.189513793,1.581141338 | ||
terror,0.786830149,1.107572377,-1.204229488 | ||
thirst,-0.711399009,-0.543800449,-0.87094664 | ||
thought,-2.059181394,-0.61006483,0.483747719 | ||
tiredness,0.322704099,-2.349754787,-0.284058492 | ||
torpor,-0.323254193,-0.737709443,-0.655607838 | ||
trance,0.179435771,-1.713174507,-0.027014016 | ||
transcendence,-0.296524665,-0.878526428,0.528111678 | ||
uncertainty,-0.76020538,-0.600989919,-0.603184049 | ||
uneasiness,0.709935592,-0.608034251,-0.671407087 | ||
unhappiness,1.147245504,-1.264787326,-0.418512669 | ||
vengeance,-0.111756432,1.383179525,-1.087977539 | ||
wakefulness,-0.796218287,0.060004176,0.323072393 | ||
warmth,0.720756579,0.034332893,1.789463826 | ||
weariness,0.290027086,-1.705178624,-0.477876764 | ||
woe,0.784806258,-0.959956956,-0.598675717 | ||
worry,0.568021766,0.200556564,-0.96988099 |
Oops, something went wrong.