Skip to content

Commit

Permalink
Ensure consistent column names in Twitter datasets
Browse files Browse the repository at this point in the history
Twitter changed the metrics included in their tweet data at some point. Since these metrics were directly copied from the underlying JSON to a mapped 4CAT item, this could lead to inconsistencies. Now all known metrics are included, with a value of '0' if not included in the original JSON.
  • Loading branch information
stijn-uva committed Feb 6, 2024
1 parent 7e02e36 commit ce2b2d5
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion datasources/twitterv2/search_twitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -778,6 +778,8 @@ def map_item(tweet):
if variants:
videos.append(variants[0].get('url'))

public_metrics = {k: tweet["public_metrics"].get(k, 0) for k in ("impression_count", "retweet_count", "bookmark_count", "like_count", "quote_count", "reply_count")}

return {
"id": tweet["id"],
"thread_id": tweet.get("conversation_id", tweet["id"]),
Expand All @@ -793,7 +795,7 @@ def map_item(tweet):
"source": tweet.get("source"),
"language_guess": tweet.get("lang"),
"possibly_sensitive": "yes" if tweet.get("possibly_sensitive") else "no",
**tweet["public_metrics"],
**public_metrics,
"is_retweet": "yes" if is_retweet else "no",
"retweeted_user": "" if not is_retweet else retweeted_user,
"is_quote_tweet": "yes" if is_quoted else "no",
Expand Down

0 comments on commit ce2b2d5

Please sign in to comment.