Ensure consistent column names in Twitter datasets

Twitter changed the metrics included in their tweet data at some point. Since these metrics were directly copied from the underlying JSON to a mapped 4CAT item, this could lead to inconsistencies. Now all known metrics are included, with a value of '0' if not included in the original JSON.
digitalmethodsinitiative · Feb 6, 2024 · ce2b2d5 · ce2b2d5
1 parent 7e02e36
commit ce2b2d5
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/datasources/twitterv2/search_twitter.py b/datasources/twitterv2/search_twitter.py
@@ -778,6 +778,8 @@ def map_item(tweet):
             if variants:
                 videos.append(variants[0].get('url'))
 
+        public_metrics = {k: tweet["public_metrics"].get(k, 0) for k in ("impression_count", "retweet_count", "bookmark_count", "like_count", "quote_count", "reply_count")}
+
         return {
             "id": tweet["id"],
             "thread_id": tweet.get("conversation_id", tweet["id"]),
@@ -793,7 +795,7 @@ def map_item(tweet):
             "source": tweet.get("source"),
             "language_guess": tweet.get("lang"),
             "possibly_sensitive": "yes" if tweet.get("possibly_sensitive") else "no",
-            **tweet["public_metrics"],
+            **public_metrics,
             "is_retweet": "yes" if is_retweet else "no",
             "retweeted_user": "" if not is_retweet else retweeted_user,
             "is_quote_tweet": "yes" if is_quoted else "no",