Skip to content

Commit

Permalink
post_topic_matrix: rename column when tokenizer created multiple docu…
Browse files Browse the repository at this point in the history
…ments per post
  • Loading branch information
dale-wahl committed Jan 23, 2025
1 parent 977d887 commit 836a235
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion processors/text-analysis/post_topic_matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def process(self):
#TODO: Either store the original document when tokenized or re-do the sentance split and, possibly, the columns if multiple were used.
# The second seems unfeasible, but the first would require somehow storing the split documents and then retrieving them here by post_id and document_id; give me a database guys!
if multiple_docs_per_post:
combined_data['document'] = token_data['multiple_docs']
combined_data['original_document_split'] = token_data['multiple_docs']

doc_predictions = model_data['predictions'][str(document_number)]
combined_data['top_topic(s)'] = ', '.join([str(int(key) + 1) for key, value in doc_predictions.items() if value == max(doc_predictions.values())]) # add one to topic key here as well
Expand Down

0 comments on commit 836a235

Please sign in to comment.