Skip to content

LLM Error & Failure Reporting #692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

sjd210
Copy link
Contributor

@sjd210 sjd210 commented Apr 25, 2025

This PR covers a few different related changes:

  • If there has been an error within the OpenAI client (e.g. out-of-date key, timeout, running out of credits), currently we are returning a zero mark answer to the user and not displaying that there has been an error. This is potentially very confusing, as the error is (almost certainly) nothing to do with the user's answer itself, but they are lead to believe it has specifically been marked wrong. To be consistent with the symbolic validator when such an error occurs, we now throw a ValidatorUnavailableException and consequently show a "Your answer could not be checked. Please try again." popup in the frontend.
  • Null checking for maxMarks, with appropriate feedback (adapted from Add null checking to LLM Free Text Questions #691)
  • Some wording changes for comments and error clarity, such as stating the current maximum answer length when exceeded
  • More tests for various erroneous behaviours

To note - I have left default zero mark results in for cases where the OpenAI client has successfully responded but not in a format we're expecting (either not json, or using more than one message). These cases are not independent from the user's answer since they may have been caused by prompt injection or other unusual input, and we want these types of answers to still receive no marks. In this case I believe it is more helpful than an error, although I'm open to suggestions.

sjd210 added 3 commits April 24, 2025 15:27
Previously this was returning a zero mark response, with no user-facing warning to indicate that it wasn't an issue with their answer
Copy link

codecov bot commented Apr 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 35.88%. Comparing base (ea71def) to head (87f49f7).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #692      +/-   ##
==========================================
+ Coverage   35.76%   35.88%   +0.12%     
==========================================
  Files         529      529              
  Lines       23478    23484       +6     
  Branches     2849     2849              
==========================================
+ Hits         8396     8427      +31     
+ Misses      14222    14196      -26     
- Partials      860      861       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sjd210 sjd210 marked this pull request as ready for review April 29, 2025 13:26
@sjd210 sjd210 changed the title LLM Error/Failure Reporting LLM Error & Failure Reporting Apr 29, 2025
@barna-isaac barna-isaac self-requested a review May 2, 2025 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants