Skip to content

Commit

Permalink
Set UTF-8 encoding when opening stop words (fixes Windows bug)
Browse files Browse the repository at this point in the history
  • Loading branch information
sal-uva committed Aug 21, 2024
1 parent a03e5fd commit 1d749c3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion processors/text-analysis/tokenise.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ def process(self):
numbers = re.compile(r"\b[0-9]+\b")

# load general stopwords dictionary
with config.get("PATH_ROOT").joinpath("common/assets/stopwords-iso.json").open() as infile:
with open(config.get("PATH_ROOT").joinpath("common/assets/stopwords-iso.json"), encoding="utf-8") as infile:
stopwords_iso = json.load(infile)

# Twitter tokenizer if indicated
Expand Down

0 comments on commit 1d749c3

Please sign in to comment.