This repository is used to store the codes of our paper "Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data".
The paper is available at: https://arxiv.org/abs/2206.00524
Please use the *.ipynb files in these folders to execute.
- PhoBERT: Pre-trained language models for Vietnamese - https://github.com/VinAIResearch/PhoBERT
- Convolutional Neural Networks for Sentence Classification - https://github.com/yoonkim/CNN_sentence
- Apache spark: a unified engine for big data processing - https://spark.apache.org/docs/3.1.1
For any usage related to all codes and data used from our repository, please cite our following paper:
@article{quoc2023vietnamese,
title={Vietnamese hate and offensive detection using PhoBERT-CNN and social media streaming data},
author={Quoc Tran, Khanh and Trong Nguyen, An and Hoang, Phu Gia and Luu, Canh Duc and Do, Trong-Hop and Van Nguyen, Kiet},
journal={Neural Computing and Applications},
volume={35},
number={1},
pages={573--594},
year={2023},
publisher={Springer}
}
For any questions, please contact our corresponding author: Mr. Khanh Quoc Tran at [email protected].