Skip to content

Latest commit

 

History

History
71 lines (49 loc) · 2.99 KB

README.md

File metadata and controls

71 lines (49 loc) · 2.99 KB

DOI

MOL - Multilingual Offensive Lexicon Annotated with Contextual Information

The MOL consists of the first specialized lexicon for hate speech detection, which was annotated with contextual information. It is composed of 1,000 explicit and implicit human-annotated rationales with pejorative connotations, which were manually identified by a linguist and annotated by three different annotators with contextual information (context-dependent or context-independent). For example, the term "stupid" consists of a context-independent offensive term since this term is mostly found in pejorative context of use. Differently, the terms "useless" and "worm" are classified as context-dependent offensive terms because they may be found in both contexts of use: (i) non-pejorative connotation, such as "this smartphone is useless" or "the fisherman uses worms for bait", as well as (ii) pejorative connotation, such as "this last President was useless" or "this being human is such a worm".

Multilingual Offensive Lexicon was extracted manually by a linguist from the HateBR dataset, and each term and expression were annotated by three different annotators, obtaining a high human-agreement score (73% Kappa). MOL was originally written in Portuguese and manually translated by native speakers takes into consideration their cultural adaptations in English, Spanish, French, German, and Turkish. Therefore, MOL is available in six different languages.

The table below describes the MOL statistics

Contextual InformationHate Targets
class label total
Context-independent offensive 1 612
Context-depedent offensive 0 387
Total 1,000
class total
no-hate 864
partyism 69
sexism 35
homophobia 16
fatphobia 9
religious intolerance 9
antisemitism 1
apology for the dictatorship 5
racism 4
antisemitistm 3
total 1,000

CITING

Vargas, F., Carvalho, I., Pardo, T.A.S., Benevenuto, F. (2024). Context-Aware and Expert Data Resources for Brazilian Portuguese Hate Speech Detection. Natural Language Processing Journal. Cambridge University Press. pp.1-22. https://www.cambridge.org/core/journals/natural-language-processing/article/contextaware-and-expert-data-resources-for-brazilian-portuguese-hate-speech-detection/7D9019ED5471CD16E320EBED06A6E923#.


BIBTEX

@article{Vargas_Carvalho_Pardo_Benevenuto_2024, author={Vargas, Francielle and Carvalho, Isabelle and Pardo, Thiago A. S. and Benevenuto, Fabrício}, title={Context-aware and expert data resources for Brazilian Portuguese hate speech detection}, DOI={10.1017/nlp.2024.18}, journal={Natural Language Processing},
year={2024}, pages={1–22}, }