Skip to content

Commit

Permalink
initiate repo
Browse files Browse the repository at this point in the history
  • Loading branch information
markallenthornton committed Aug 7, 2018
0 parents commit 2dda6f6
Show file tree
Hide file tree
Showing 12 changed files with 17,549 additions and 0 deletions.
Binary file added 3daffect.Rdata
Binary file not shown.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# 3daffect
### Creation and validation of a 3-dimensional sentiment dictionary

A weighted 3-dimensional sentiment dictionary for quantifying the affect in text samples. The dictionary covers approximately 2 million tokens in the pre-trained (common crawl) fastText word vector embedding. These word vectors (not included in repository due to size, but available [here](https://fasttext.cc/docs/en/english-vectors.html)) are necessary for re-running some elements of the
code included in this repository.

A radial basis function support vector regression was trained to predict ratings of 166 mental state words on 3 principal component dimensions (rationality vs. emotionality, social impact, and valence [+/-]) based on the 300d fastText embedding. This regression achieved relatively high accuracy in 5-fold cross-validation: r = .86, .85, and .91, respectively; RMSE = .60, .60,, .51, respectively, vs. chance at SD=1. An SVM-R trained on all 166 state words was then used to impute 3-d affect scores to all words in the fastText corpus, creating a weighted dictionary.

The resulting weighted dictionary was validated in two ways. First, dictionary scores of individual words on the 3 dimensions were correlated with approximately matched human ratings of dominance, arousal, and valence across nearly 14k words normed in [Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207.](http://crr.ugent.be/archives/1003). Resulting correlations were .57 for dominance-rationality, .27 for arousal-social impact, and .67 for valence-valence. This reliable out-of-sample prediction, particularly for the exact dimension match of valence to valence, suggests that the dictionary creation method was largely successful.

Second, the 3d affect dictionary was compared with the 14k ratings from Warriner et al. in terms of scoring extended pieces of text. These consisted of sentences from Amazon reviews, entire IMdB reviews, and sentences from Yelp reviews, curated as part of Group to [Individual Labels using Deep Features, Kotzias et. al,. KDD 2015](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences). Each sentences/review was labeled with a binary 1/0 for positive vs. negative, which we attempted to predict using the valence dimensions of both dictionaries. Results favored the 3d affect dictionary over the 14k human rated words for two of the three validation sets despite the far smaller set of words originally normed in the affect dictionary: Amazon - 60% vs 69% accuracy; IMdB - 73% accuracy vs. 70% accuracy; and Yelp - 73% accuracy vs. 69% accuracy. This superior performance was achieved in part - though not completely - due to the fact that the 3d affect model was able to score every piece of text due to its large number of tokens.

Please cite the paper which originally derived the 3-dimensional model of affect from analysis of patterns of brain activity associated with mental state representation [Tamir, D. I., Thornton, M. A., Contreras, J. M., & Mitchell, J. P. (2016). Neural evidence that three dimensions organize mental state representation: Rationality, social impact, and valence. Proceedings of the National Academy of Sciences of the United States of America, 113(1), 194-199.](http://markallenthornton.com/cv/TamirThornton_PNAS_2016.pdf)

This dictionary was built as part of [Methods in Neuroscience at Dartmouth (MIND)](https://summer-mind.github.io/index.html), 2018.
13,916 changes: 13,916 additions & 0 deletions Ratings_Warriner_et_al.csv

Large diffs are not rendered by default.

105 changes: 105 additions & 0 deletions create_dictionary.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@


# load packages -----------------------------------------------------------

if(!require(e1071)) install.packages("e1071"); require(e1071)
library(data.table)

# define functions --------------------------------------------------------

rmse <- function(x,y){
return(sqrt(mean((x-y)^2)))
}

# load (small) data -------------------------------------------------------

# pricipal components scores
pcs <- read.csv("pc166.csv")
states <- as.character(pcs$states)
pcs <- pcs[,2:4]
pcs[,1] <- -pcs[,1]
rownames(pcs)<-states
#pcs <- apply(pcs,2,rs01)

# vectors for states
svecs <- read.csv("state_vectors.csv",header=F)
rownames(svecs)<-svecs$V1
svecs <- svecs[,2:301]


# test method on state words ----------------------------------------------

# svm-regression version
set.seed(1)
cvinds <- sample(rep(1:5,34)[1:166])
cperf <- matrix(NA,5,3)
eperf <- matrix(NA,5,3)
for (i in 1:5){
tsel <- cvinds == i
pcs.train <- pcs[!tsel,]
svecs.train <- svecs[!tsel,]
pcs.test <- pcs[tsel,]
svecs.test <- svecs[tsel,]
for (j in 1:3){
y <- pcs.train[,j]
x <- as.matrix(svecs.train)
fit <- svm(y~x,kernel="radial",cost = 4)
x <- as.matrix(svecs.test)
preds <- predict(fit,x)
cperf[i,j]<-cor(preds,pcs.test[,j])
eperf[i,j]<-rmse(preds,pcs.test[,j])
}
}
colMeans(cperf)
colMeans(eperf)

# fit full model
svmlist <- list()
for (i in 1:3){
y <- pcs[,i]
x <- as.matrix(svecs)
svmlist[[i]]<-svm(y~x,kernel="radial",cost = 4)
}


# cycle through vectors ---------------------------------------------------

#fast <- fread("./crawl-300d-2M.vec/crawl-300d-2M.vec",skip = 1)
#save(fast,file="fast.Rdata")

load("fast.Rdata")
fast <- as.data.frame(fast)



tokens <- as.character(fast$V1)
nvec <- dim(fast)[1]
dict <- matrix(NA,nvec,3)
start <- proc.time()
for (v in 1:nvec) {
x <- matrix(as.numeric(fast[v,2:301]),1,300)
for (i in 1:3){
dict[v,i] <- predict(svmlist[[i]],x)
}
if ((v %% 1000)==0){
print(v)
print(proc.time()-start)
}
}

save(tokens,dict,file = "3daffect.Rdata")

rownames(dict)<-tokens
colnames(dict)<-gsub("\\."," ",colnames(pcs))
write.csv(dict,"3daffect_dict.csv")

# 2 million tokens with weighted valence vs:
# 407 positive emotion and 501 negative emotion words in LIWC
# 1747 positive and 4086 negative emotion words in qdap







37 changes: 37 additions & 0 deletions extract_ms_vec.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# -*- coding: utf-8 -*-
"""
extract_ms_vec.py
Created on Fri Jul 13 13:45:15 2018
@author: mthornton
"""


import io
import csv

with open('pc166.csv', 'rb') as dfile:
reader = csv.reader(dfile)
header = reader.next()
states = []
for row in reader:
states.append(row[0])


def load_vectors(fname, whitelist):
fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
n, d = map(int, fin.readline().split())
data = {}
for line in fin:
tokens = line.rstrip().split(' ')
if (tokens[0] in whitelist):
data[tokens[0]] = map(float, tokens[1:])
fin.close()
return data

data = load_vectors('./crawl-300d-2M.vec/crawl-300d-2M.vec', states)
with open('state_vectors.csv', 'wb') as dfile:
writer = csv.writer(dfile)
for s in states:
writer.writerow([s] + data[s])
167 changes: 167 additions & 0 deletions pc166.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
states,rationality,social impact,valence
admiration,0.025735943,0.934132467,1.470302876
affection,0.767795367,1.170277563,1.908797332
agitation,0.462970394,0.687185098,-0.874492512
alarm,-0.246642861,0.975805975,-0.990128624
alertness,-1.280707584,0.942608723,0.100091573
amazement,0.930492177,0.858250183,1.124844325
ambivalence,-0.471409341,-0.915971314,0.161737969
amusement,0.853737842,0.704328792,1.433350513
anger,0.782410082,1.367393318,-0.999438824
annoyance,0.582473308,0.765227922,-0.890200227
anticipation,-0.804949979,0.957129685,0.213377146
anxiety,0.623693035,0.56697936,-0.961400457
apathy,0.33371892,-1.169374348,0.07021582
appreciation,0.036759909,0.621946941,1.513059635
apprehension,-0.223149786,0.149008738,-0.46139021
attention,-2.071314946,1.194434957,0.395945101
awareness,-1.7331338,0.760633882,0.669893114
awe,0.958188864,0.285491368,1.068144643
belief,-1.164222181,-0.055190077,0.733197076
bewilderment,0.292159607,-0.117887844,-0.408392165
bias,-1.291705368,0.537925576,-0.783441722
bitterness,0.902736365,0.054791158,-0.919873541
boredom,0.208353031,-1.6741052,-0.498166666
calmness,0.507993858,-1.730711581,2.096434335
certainty,-1.999659272,-0.237838334,0.626624182
cheerfulness,1.039182597,1.040732284,1.565231309
cognition,-2.60731945,-0.262765207,0.280897804
concern,-0.230170693,0.677796353,0.604795361
confusion,-0.147566988,-0.19840759,-0.651234408
consciousness,-1.425032411,-0.279013366,0.689648498
contemplation,-1.655585131,-1.450600284,0.586508288
contempt,0.161739517,0.527089791,-0.652786184
contentment,0.507450773,-0.693323948,1.519139441
craziness,0.099959294,0.742865406,-1.037267544
curiosity,-0.728232578,0.708204937,0.891215141
decision,-2.912353287,0.463439667,0.290418857
delight,1.112818673,0.7322698,1.66872562
depression,1.134585171,-2.001262803,-0.491519168
derangement,-0.184346664,-0.513263745,-1.230575853
desire,0.488863009,1.380981444,0.988605747
despair,1.100415892,-0.730222971,-0.786994364
disappointment,0.806179654,-0.742747272,-0.726481688
disarray,-0.584116549,-0.332136599,-1.247572119
disbelief,-0.256645308,-0.491204232,-0.621827801
disgust,0.858473605,0.311946472,-1.073709635
distress,0.712297686,0.47941285,-1.109223142
distrust,-0.21655778,0.649425742,-0.767877049
dominance,-1.759922272,1.928856793,-0.412518607
doubt,-0.429298397,-0.705583079,-0.769908968
dread,0.822571661,-0.137657165,-1.02268029
dreaminess,0.735103015,-1.627253626,1.232698315
drowsiness,0.177340652,-2.274985871,-0.252875755
drunkenness,-0.417283191,-0.27763094,-0.863196523
earnestness,-0.404476667,0.112791241,1.053073347
ecstasy,1.126134307,0.800604027,1.045231939
elation,1.073494673,0.58843831,1.409527858
embarrassment,1.031733956,0.93759749,-0.91711799
emotion,1.210577778,0.401021047,0.970995376
empathy,0.723784498,0.3845917,1.661972548
enjoyment,0.935932691,0.795059155,1.777295143
enthusiasm,0.428051508,1.067270888,1.428542338
envy,0.683901194,1.308299495,-0.864815885
exaltation,0.263656941,0.147047278,0.593544531
exasperation,0.303592025,0.253141476,-0.773409123
excitement,0.955982195,1.188626036,1.536104261
exhaustion,-0.011743445,-1.487878285,-0.48808895
expectation,-1.569158964,0.610184619,0.203231102
fascination,-0.037208454,0.757566307,1.088721773
fatigue,-0.050033009,-2.088650741,-0.445218137
fear,0.636304835,0.977200688,-1.149661429
feeling,1.162671176,-0.032149668,1.098617284
frenzy,-0.090129697,1.243834991,-0.851518066
friendliness,0.272890394,1.285582603,2.056562312
frustration,0.493072453,0.693277327,-0.936062849
fury,0.892291071,0.979068273,-0.91685364
gloominess,1.327704318,-1.997903844,-0.369805244
guilt,0.820951937,-0.033434889,-1.039157727
hallucination,-0.1566409,-0.664213273,-1.03754451
happiness,1.277805118,0.928183019,1.717882897
hate,0.764051526,1.199128006,-1.054020357
hope,0.590053519,0.163786244,1.524199869
horror,0.954998966,0.603295938,-1.430461528
humiliation,0.705749249,1.014438552,-1.100852745
humor,0.250919771,1.028003588,1.305810003
hunger,-0.672408113,-0.162040541,-1.102407745
hypnosis,-0.326613767,-1.473702971,-0.149335716
hysteria,0.446137404,0.883212903,-1.317924735
imagination,-0.700668729,-0.251757854,0.909582201
impatience,-0.027108914,0.70817421,-0.976842167
indecisiveness,-1.014938031,-0.6175013,-0.8284021
indifference,-0.222811792,-1.106584471,-0.263220581
insanity,-0.018410371,0.30734046,-1.375482973
inspiration,-0.184287013,0.462821406,1.083639321
intention,-2.360469782,0.071133008,0.169851097
interconnectedness,-0.724569294,0.521660848,0.997159918
interest,-1.174122294,0.851577804,0.968739327
intrigue,-0.721951986,0.589300222,0.772841434
irritation,0.789273414,0.387069228,-0.891181025
jealousy,0.724690361,1.493598668,-0.937239668
judgment,-2.051066083,0.975138778,-0.611428555
laziness,-0.318537991,-2.018456689,-0.493568802
lethargy,-0.133967929,-2.146007407,-0.637809597
loneliness,1.190104587,-1.343440762,-0.423789104
lust,0.593161585,1.87037369,-0.064433537
melancholy,1.065616809,-1.621275676,-0.30645836
memory,-1.678326693,-0.379380457,0.295704699
misery,1.201442192,-1.005439684,-0.615879251
mortification,0.525026445,0.594380276,-1.415757602
nervousness,0.523544216,0.536682638,-0.840736609
objectivity,-2.434491368,-0.486333397,-0.097467945
opinion,-2.240860378,0.444946614,-0.056029523
optimism,0.182095073,0.254426998,1.239824201
outrage,0.624975418,1.366724637,-1.208603422
pain,0.311639229,0.128627509,-1.305662996
panic,0.472117106,1.141795258,-1.253970669
patience,-0.457746501,-1.015309284,1.559286726
peacefulness,0.917163542,-1.558029508,2.095240772
pensiveness,-0.856712666,-1.671660446,0.388304374
pity,0.878068344,-0.312653053,-0.11252334
planning,-3.040647578,0.30379917,0.097109337
playfulness,0.300856326,1.388022582,1.517898961
pleasure,1.033172002,0.963773989,1.666223477
prejudice,-0.719327751,0.977182844,-1.019037088
preoccupation,-1.176342318,-0.373516086,-0.462258258
pride,0.205683604,0.852948563,0.604321435
rage,0.882214652,1.007879331,-1.119140046
reason,-2.588647603,-0.445249356,0.418709918
regret,0.609623306,-0.444317374,-0.960113841
relaxation,0.380209871,-1.696477446,2.062468813
relief,0.650330124,-0.948259161,1.612633886
remorse,0.945827391,-0.644603885,-0.410454495
resentment,0.498443142,0.513393639,-0.857308056
sadness,1.376014495,-1.261007903,-0.376957422
satisfaction,0.081217726,-0.109882404,1.595398554
self-consciousness,-0.670412183,-0.05252964,-0.232127006
self-control,-2.253707557,-0.567294539,0.673667065
self-pity,0.625739869,-1.467834491,-0.702620504
serenity,0.725609505,-1.666420357,1.868379989
seriousness,-0.955755512,-0.429136604,0.082920376
shame,0.988499388,0.190767077,-0.877431917
shock,0.520907678,0.848194883,-0.945998101
skepticism,-1.47700388,-0.280978123,-0.495071272
sleepiness,0.156886642,-2.334579375,0.105937619
sorrow,1.28782441,-1.132810494,-0.484308968
stress,-0.12450088,0.716216123,-1.180044023
stupor,-0.195512437,-1.323397174,-0.713699257
subordination,-1.544128211,0.16431237,-0.709366687
surprise,0.709284281,1.272726663,0.540075561
suspicion,-0.651074179,0.835240268,-0.825442141
sympathy,1.090717155,0.189513793,1.581141338
terror,0.786830149,1.107572377,-1.204229488
thirst,-0.711399009,-0.543800449,-0.87094664
thought,-2.059181394,-0.61006483,0.483747719
tiredness,0.322704099,-2.349754787,-0.284058492
torpor,-0.323254193,-0.737709443,-0.655607838
trance,0.179435771,-1.713174507,-0.027014016
transcendence,-0.296524665,-0.878526428,0.528111678
uncertainty,-0.76020538,-0.600989919,-0.603184049
uneasiness,0.709935592,-0.608034251,-0.671407087
unhappiness,1.147245504,-1.264787326,-0.418512669
vengeance,-0.111756432,1.383179525,-1.087977539
wakefulness,-0.796218287,0.060004176,0.323072393
warmth,0.720756579,0.034332893,1.789463826
weariness,0.290027086,-1.705178624,-0.477876764
woe,0.784806258,-0.959956956,-0.598675717
worry,0.568021766,0.200556564,-0.96988099
Loading

0 comments on commit 2dda6f6

Please sign in to comment.