Skip to content

Latest commit

 

History

History
23 lines (12 loc) · 1.19 KB

File metadata and controls

23 lines (12 loc) · 1.19 KB

Semantic Clustering of Italian Political News on Facebook

This repository contains the code and data supporting the working paper "Semantic Clustering of Italian Political News on Facebook: Comparing text-embedding-3-large and UmBERTo Embeddings using HDBSCAN and K-means".

Overview

This study compares the performance of OpenAI's text-embedding-3-large model against the BERT-based UmBERTo model for clustering Italian political news content. We utilize two distinct datasets of political news stories circulated on Facebook before the 2018 and 2022 Italian elections.

Repository Contents

  • /: R and Python scripts for data processing, embedding generation, clustering, and analysis

  • rawdata/: Title and description of 35,795 links circulated on Facebook prior to 2018 and 2022 Italian elections. Sample of pair links coded by thematic coherence by human expertsin JSONL

  • output/: Empty output folders

  • output/: Empty data folders

  • LICENSE: License information for the project

Contact

For questions or feedback, please open an issue in this repository or contact Fabio Giglietto.