LLM Error & Failure Reporting #692

sjd210 · 2025-04-25T14:04:36Z

This PR covers a few different related changes:

If there has been an error within the OpenAI client (e.g. out-of-date key, timeout, running out of credits), currently we are returning a zero mark answer to the user and not displaying that there has been an error. This is potentially very confusing, as the error is (almost certainly) nothing to do with the user's answer itself, but they are lead to believe it has specifically been marked wrong. To be consistent with the symbolic validator when such an error occurs, we now throw a ValidatorUnavailableException and consequently show a "Your answer could not be checked. Please try again." popup in the frontend.
Null checking for maxMarks, with appropriate feedback (adapted from Add null checking to LLM Free Text Questions #691)
Some wording changes for comments and error clarity, such as stating the current maximum answer length when exceeded
More tests for various erroneous behaviours

To note - I have left default zero mark results in for cases where the OpenAI client has successfully responded but not in a format we're expecting (either not json, or using more than one message). These cases are not independent from the user's answer since they may have been caused by prompt injection or other unusual input, and we want these types of answers to still receive no marks. In this case I believe it is more helpful than an error, although I'm open to suggestions.

Previously this was returning a zero mark response, with no user-facing warning to indicate that it wasn't an issue with their answer

codecov · 2025-04-25T14:07:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 35.88%. Comparing base (ea71def) to head (87f49f7).
Report is 15 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #692      +/-   ##
==========================================
+ Coverage   35.76%   35.88%   +0.12%     
==========================================
  Files         529      529              
  Lines       23478    23484       +6     
  Branches     2849     2849              
==========================================
+ Hits         8396     8427      +31     
+ Misses      14222    14196      -26     
- Partials      860      861       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

src/test/java/uk/ac/cam/cl/dtg/isaac/api/QuestionFacadeTest.java

barna-isaac · 2025-05-06T11:02:10Z

Looks great, and I love that we've improved our test coverage! I'll go ahead and merge this.

sjd210 added 3 commits April 24, 2025 15:27

Allow LLM subject config to be unset safely

ed654ec

Throw exception on OpenAI failure

66f7140

Previously this was returning a zero mark response, with no user-facing warning to indicate that it wasn't an issue with their answer

Update tests to allow exceptions

3b07d56

sjd210 added 3 commits April 28, 2025 09:32

Add warning for missing maxMarks

d1a43c0

Reconfigure Question Facade test file

68a3bf2

Add tests for asserting LLM-answerability

ae38348

github-advanced-security bot found potential problems Apr 28, 2025

View reviewed changes

src/test/java/uk/ac/cam/cl/dtg/isaac/api/QuestionFacadeTest.java Fixed Show fixed Hide fixed

sjd210 added 2 commits April 29, 2025 13:54

Replace deprecated exception method

c847bc6

Reconfigure new validator tests

2d17351

sjd210 marked this pull request as ready for review April 29, 2025 13:26

sjd210 changed the title ~~LLM Error/Failure Reporting~~ LLM Error & Failure Reporting Apr 29, 2025

simplify setting default empty value for subject

87f49f7

barna-isaac self-requested a review May 2, 2025 10:59

barna-isaac merged commit 1e5c907 into master May 6, 2025
5 checks passed

barna-isaac deleted the improvement/llm-failure-reporting branch May 6, 2025 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM Error & Failure Reporting #692

LLM Error & Failure Reporting #692

Uh oh!

sjd210 commented Apr 25, 2025 •

edited

Loading

Uh oh!

codecov bot commented Apr 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

barna-isaac commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

LLM Error & Failure Reporting #692

LLM Error & Failure Reporting #692

Uh oh!

Conversation

sjd210 commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

barna-isaac commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

sjd210 commented Apr 25, 2025 •

edited

Loading

codecov bot commented Apr 25, 2025 •

edited

Loading