Skip to content

Add better profanity filtering and filter out empty notifications #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: base-sha/7ebe9275f0b7f9d1f2ca65aa4be1b89407f7c1c8
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dependencies:
- ffmpeg
- pip
- pip:
- scrubadub
- better_profanity
- transformers
- accelerate
- pytorch
Expand Down
20 changes: 11 additions & 9 deletions ttt.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@

import apprise
import requests
import scrubadub
import torch

from transformers import (
Expand All @@ -18,6 +17,7 @@
AutoModelForSpeechSeq2Seq,
AutoProcessor,
)
from better_profanity import profanity

# Before we dig in, let's globally set up transformers
# We will load up the model, etc now so we only need to
Expand Down Expand Up @@ -76,6 +76,7 @@
torch_dtype=torch_dtype,
device=device,
)
profanity.load_censor_words()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code_refinement): Consider initializing profanity filter setup in a dedicated initialization function.

Placing the initialization of the profanity filter directly in the global scope of the script might lead to issues with maintainability and testing. It's generally a good practice to encapsulate setup logic in a function.

Suggested change
profanity.load_censor_words()
def initialize_profanity_filter():
profanity.load_censor_words()
if __name__ == "__main__":
initialize_profanity_filter()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment helpful?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment type correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment area correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of LLM test could this comment become?

  • 👍 - this comment is really good/important and we should always make it
  • 👎 - this comment is really bad and we should never make it
  • no reaction - don't turn this comment into an LLM test



def transcribe_transformers(calljson, audiofile):
Expand Down Expand Up @@ -121,11 +122,8 @@ def send_notifications(calljson, audiofile, destinations):
send_notifications(calljson, audiofile, destinations)
"""

# Scrubadub redacts PII let's try and clean the text before
# goes out the door
scrubber = scrubadub.Scrubber()
scrubber.remove_detector("email")
body = scrubber.clean(calljson["text"])
# Run ai text through profanity filter
body = profanity.censor(calljson["text"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing test for the new profanity filter functionality.

Ensure there is a test that verifies the profanity filter correctly censors inappropriate language and handles edge cases, such as mixed case, special characters, and multiple languages if applicable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment helpful?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment type correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment area correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of LLM test could this comment become?

  • 👍 - this comment is really good/important and we should always make it
  • 👎 - this comment is really bad and we should never make it
  • no reaction - don't turn this comment into an LLM test

title = (
calljson["talkgroup_description"]
+ " @ "
Expand Down Expand Up @@ -308,16 +306,20 @@ def main():
# Send the json and audiofile to a function to transcribe
# If TTT_DEEPGRAM_KEY is set, use deepgram, else
# if TTT_WHISPER_URL is set, use whisper.cpp else
# fasterwhisper
# transformers
if os.environ.get("TTT_DEEPGRAM_KEY", False):
calljson = transcribe_deepgram(calljson, audiofile)
elif os.environ.get("TTT_WHISPERCPP_URL", False):
calljson = transcribe_whispercpp(calljson, audiofile)
else:
calljson = transcribe_transformers(calljson, audiofile)

# Ok, we have text back, send for notification
send_notifications(calljson, audiofile, destinations)
# When Whisper process a file with no speech, it tends to spit out "you"
# Just "you" and nothing else.
# So if the transcript is just "you", don't bother sending the notification,
# we will just delete the files and keep going to the next call.
if calljson["text"] != "you":
send_notifications(calljson, audiofile, destinations)
Comment on lines +321 to +322

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing test for the new condition to filter out empty notifications.

It's important to add a test to verify that notifications are not sent when the transcript is just 'you'. This test should cover scenarios where the transcript is exactly 'you', contains 'you' with additional text, and completely different text to ensure the condition works as expected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment helpful?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment type correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment area correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of LLM test could this comment become?

  • 👍 - this comment is really good/important and we should always make it
  • 👎 - this comment is really bad and we should never make it
  • no reaction - don't turn this comment into an LLM test


# And now delete the files from the transcribe directory
try:
Expand Down