-
Notifications
You must be signed in to change notification settings - Fork 0
Add better profanity filtering and filter out empty notifications #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: base-sha/7ebe9275f0b7f9d1f2ca65aa4be1b89407f7c1c8
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,7 +9,6 @@ | |
|
||
import apprise | ||
import requests | ||
import scrubadub | ||
import torch | ||
|
||
from transformers import ( | ||
|
@@ -18,6 +17,7 @@ | |
AutoModelForSpeechSeq2Seq, | ||
AutoProcessor, | ||
) | ||
from better_profanity import profanity | ||
|
||
# Before we dig in, let's globally set up transformers | ||
# We will load up the model, etc now so we only need to | ||
|
@@ -76,6 +76,7 @@ | |
torch_dtype=torch_dtype, | ||
device=device, | ||
) | ||
profanity.load_censor_words() | ||
|
||
|
||
def transcribe_transformers(calljson, audiofile): | ||
|
@@ -121,11 +122,8 @@ def send_notifications(calljson, audiofile, destinations): | |
send_notifications(calljson, audiofile, destinations) | ||
""" | ||
|
||
# Scrubadub redacts PII let's try and clean the text before | ||
# goes out the door | ||
scrubber = scrubadub.Scrubber() | ||
scrubber.remove_detector("email") | ||
body = scrubber.clean(calljson["text"]) | ||
# Run ai text through profanity filter | ||
body = profanity.censor(calljson["text"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (testing): Missing test for the new profanity filter functionality. Ensure there is a test that verifies the profanity filter correctly censors inappropriate language and handles edge cases, such as mixed case, special characters, and multiple languages if applicable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this comment correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this comment helpful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the comment type correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the comment area correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What type of LLM test could this comment become?
|
||
title = ( | ||
calljson["talkgroup_description"] | ||
+ " @ " | ||
|
@@ -308,16 +306,20 @@ def main(): | |
# Send the json and audiofile to a function to transcribe | ||
# If TTT_DEEPGRAM_KEY is set, use deepgram, else | ||
# if TTT_WHISPER_URL is set, use whisper.cpp else | ||
# fasterwhisper | ||
# transformers | ||
if os.environ.get("TTT_DEEPGRAM_KEY", False): | ||
calljson = transcribe_deepgram(calljson, audiofile) | ||
elif os.environ.get("TTT_WHISPERCPP_URL", False): | ||
calljson = transcribe_whispercpp(calljson, audiofile) | ||
else: | ||
calljson = transcribe_transformers(calljson, audiofile) | ||
|
||
# Ok, we have text back, send for notification | ||
send_notifications(calljson, audiofile, destinations) | ||
# When Whisper process a file with no speech, it tends to spit out "you" | ||
# Just "you" and nothing else. | ||
# So if the transcript is just "you", don't bother sending the notification, | ||
# we will just delete the files and keep going to the next call. | ||
if calljson["text"] != "you": | ||
send_notifications(calljson, audiofile, destinations) | ||
Comment on lines
+321
to
+322
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (testing): Missing test for the new condition to filter out empty notifications. It's important to add a test to verify that notifications are not sent when the transcript is just 'you'. This test should cover scenarios where the transcript is exactly 'you', contains 'you' with additional text, and completely different text to ensure the condition works as expected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this comment correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this comment helpful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the comment type correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the comment area correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What type of LLM test could this comment become?
|
||
|
||
# And now delete the files from the transcribe directory | ||
try: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code_refinement): Consider initializing profanity filter setup in a dedicated initialization function.
Placing the initialization of the profanity filter directly in the global scope of the script might lead to issues with maintainability and testing. It's generally a good practice to encapsulate setup logic in a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment type correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment area correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of LLM test could this comment become?