StanceSSR

A collection of stance detection datasets based on SemEval2016 Task 6 to assert whether stance detection models understand the task or simply overfit the training data.

Description of the used data

These data are based on SemEval2016 Shared Task 6 "Stance Detection on Tweets", see the detailed description in this paper and on this site.

In this following, we describe the data used in the ssr-paper. The structure of the constructed SemEval2016_SSR.json looks like this:

{
    "all_instances":[] 
    
    "data_split_by_target": {
            "Hillary Clinton": {
                "train": list, indices in "all_instances",
                "valid": list, indices in "all_instances",
                "test": list, indices in "all_instances"
            },
            ...
            "mix": mixture of 5 targets used in subtask A
    
    "label_mapping": {
        "word2idx": {}, 
        "idx2word": {},
        ...
        "hasSenti2idx":  
        "targetIn2idx": dict, mapping target_in_tweet, target_not_in_tweet into indices
        }
    
    "hard_sets": {
        "tweet_only": TOF, samples in which 3 tweet_only BERT models failed
        "pmi": PMI, samples containing long-tailed features according to PMI scores
        "opinion_towards": OT, samples whose targets are implicitly or indirectly mentioned
        "dt": DT, 
        "target_mismatch": Replaced, samples whose target is replaced with another target that does not appear in the tweet
        "negate": Negated, samples whose targets are replaced their negated counterpart
    }
}

One sample in all_instances looks like this:

{
    "text": "Keep the faith. The most amazing things in life tend to happen right at the moment you're about to give up hope. #SemST", 
    "target": "Atheism", 
    "label": "AGAINST", 
    "opinion_towards": "1.  The tweet explicitly expresses opinion about the target, a part of the target, or an aspect of the target.", 
    "sentiment": "pos", 
    "orig_idx": 400,
    "target_in_tweet": "in_tweet", 
    "has_senti": "pos", 
    "stance_towards_target": "AGAINST",
}

And one sample in the hard set opinion_towards looks like this:

{
    "text": "@someone Giving birth is not pushing women toward death. #SemST", 
    "target": "Legalization of Abortion", 
    "label": "AGAINST", 
    "opinion_towards": "2. The tweet does NOT expresses opinion about the target but it HAS opinion about something or someone other than the target.", 
    "sentiment": "neg", 
    "orig_idx": 3897, 
    "target_in_tweet": "not_in_tweet", 
    "stance_towards_target": "AGAINST",
}

where "orig_idx" refers to the original sample in all_instances.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StanceSSR

Description of the used data

About

Releases

Packages

Surpriseshelf/StanceSSR

Folders and files

Latest commit

History

Repository files navigation

StanceSSR

Description of the used data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages