NotifAI Performance Improvements by PaulTaylor · Pull Request #91 · NHSDigital/nhs-notify-admail

PaulTaylor · 2025-12-16T11:11:04Z

Description

Improve performance of the model pipeline to more accurately classify documents as admail/business.

Add a docker compose stack for local development
Update the system prompt
Remove text extraction and use multi-modality in llm instead
Switch the default model to Nova-Lite
Update evaluation framework to account for new pipeline
Update dependencies to fix vulnerabilities (for packages used in our code, there are some dependabot issues that may remain in the wider template (eg. in docs))
Disable the schedule for the automated evaluation as it always returns the same result from the same model/docs

Context

The current model and prompt are generating false-negatives for letters that should be classified as admail. Misclassifying letters that are appropriate for admail as business incurs unnecessary postage costs.

Type of changes

Refactoring (non-breaking change)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would change existing functionality)
Bug fix (non-breaking change which fixes an issue)

Checklist

I am familiar with the contributing guidelines
I have followed the code style of the project
I have added tests to cover my changes
I have updated the documentation accordingly
This PR is a result of pair or mob programming

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

src/backend/bedrock-prompt-messager/services/bedrock_service.py

src/backend/bedrock-prompt-messager/api_wrapper.py

* Reformatted advice section to remove brackets * Removed content improvement and design best practice advice - can be reenabled by using system_prompt_extended.txt * Switched to using British English spellings

docker-compose.yaml

Copilot

Pull request overview

This PR updates the NotifAI document-classification pipeline to improve performance and accuracy by switching to a multimodal Bedrock flow (sending documents directly to the LLM), updating prompts/models, refreshing the evaluation/alerting pipeline, and adding local Docker tooling.

Changes:

Replace text extraction with multimodal document input (frontend + Bedrock prompt service).
Rework automated evaluation runner to run direct converse calls and adjust alert scoring to match the new output format.
Add/adjust local Docker + dependency hardening (nginx health endpoint, npm overrides, etc.), and update infra to Nova-Lite + scheduling changes.

Reviewed changes

Copilot reviewed 41 out of 45 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
tools/prompt-input-generator/prompt-input-generator.py	Generates JSONL prompts from S3 documents for the new multimodal evaluation format.
tools/prompt-input-generator/env.example	Updates generator env vars to S3-based inputs and Nova-Lite defaults.
src/frontend/notifai-uploader/src/components/FileUpload.js	Switches upload flow to DataURL (base64) for multimodal input instead of `/convert`.
src/frontend/notifai-uploader/src/components/AIfeedback.js	Tweaks markdown/text cleanup for LLM feedback rendering.
src/frontend/notifai-uploader/src/api/BackendAPIClient.js	Adjusts baseURL construction for backend API config.
src/frontend/notifai-uploader/src/Pages/FileUploadPage.js	Removes page-count handling and wires new upload payload through to the LLM call.
src/frontend/notifai-uploader/package.json	Adds dependency overrides for security/vulnerability mitigation.
src/frontend/notifai-uploader/package-lock.json	Locks updated dependency graph consistent with overrides.
src/frontend/notifai-uploader/nginx.conf	Adds a `/health` endpoint for container health checks.
src/frontend/notifai-uploader/Dockerfile	Updates nginx base and config placement; adds `apk upgrade`.
src/frontend/notifai-uploader/.dockerignore	Introduces docker ignore rules for frontend build context.
src/backend/bedrock_evaluations_runner/tests/test_bedrock_evaluation_service.py	Updates evaluation runner tests for the new programmatic evaluation path.
src/backend/bedrock_evaluations_runner/tests/evaluation_example.json	Adds a sample JSONL evaluation record using multimodal `document` input.
src/backend/bedrock_evaluations_runner/constants.py	Centralizes tool schema constants and common error messages.
src/backend/bedrock_evaluations_runner/bedrock_evaluation_service.py	Replaces Bedrock “evaluation job” API usage with direct `converse` execution over JSONL prompts.
src/backend/bedrock_evaluations_runner/bedrock_evaluation_lambda.py	Updates Lambda handler to use new evaluation runner signature and flow.
src/backend/bedrock_evaluations_runner/.env-example	Updates default model identifiers and region example values.
src/backend/bedrock_alerts/tests/test_evaluations_alert_service.py	Updates alert scoring tests to match new result record format.
src/backend/bedrock_alerts/evaluations_alert_service.py	Changes scoring logic from Bedrock eval-job metric output to `rating` vs `actualClass`.
src/backend/bedrock_alerts/evaluations_alert_lambda.py	Adds logging and adjusts control flow for alert triggering.
src/backend/bedrock-prompt-messager/tests/services/test_bedrock_service.py	Extends tests to cover multimodal document inputs and mime handling.
src/backend/bedrock-prompt-messager/system_prompt.txt	Updates the system prompt for improved NHS-specific classification and effectiveness advice.
src/backend/bedrock-prompt-messager/services/bedrock_service.py	Implements Data URL parsing and sends document bytes via Bedrock multimodal `document` content.
src/backend/bedrock-prompt-messager/api_wrapper.py	Adds a Flask wrapper for local HTTP testing of the Lambda handler.
src/backend/bedrock-prompt-messager/Dockerfile	Adds a containerized local runner exposing the Flask wrapper.
src/backend/app/tests/services/test_convert_service.py	Removes tests for the deprecated `/convert` text-extraction pipeline.
src/backend/app/tests/routers/test_convert_router.py	Removes `/convert` router tests.
src/backend/app/services/convert_service.py	Removes the convert service (pandoc/pdf extraction) implementation.
src/backend/app/routers/convert.py	Removes the `/convert` endpoint router.
src/backend/app/requirements.txt	Drops pandoc/pypdf and updates boto3 dependency flavor.
src/backend/app/main.py	Expands allowed CORS origins for local dev and adds `/health`.
src/backend/app/core/auth.py	Exempts `/health` from authentication middleware.
src/backend/app/Dockerfile	Updates FastAPI CLI invocation for container startup.
scripts/config/sonar-scanner.properties	Adjusts sonar exclusions/coverage exclusions for new local wrapper artifacts.
infrastructure/terraform/etc/group_nhs-notify-poc001.tfvars	Updates log retention setting.
infrastructure/terraform/etc/env_eu-west-2_test.tfvars	Switches prompt/eval models to Nova-Lite.
infrastructure/terraform/etc/env_eu-west-2_dev2.tfvars	Switches prompt/eval models to Nova-Lite.
infrastructure/terraform/etc/env_eu-west-2_dev1.tfvars	Switches prompt/eval models to Nova-Lite.
infrastructure/terraform/components/notifyai/resources/prompt-data/letter-responses.json	Removes legacy ground-truth response mapping used by the old pipeline.
infrastructure/terraform/components/notifyai/lambda.tf	Updates Bedrock model permissions toward Nova-Lite.
infrastructure/terraform/components/notifyai/eventbridge.tf	Alters scheduled evaluation behavior via an explicit scheduler end date.
infrastructure/terraform/components/notifyai/bedrock.tf	Updates Bedrock access policy resources to Nova-Lite inference profile.
docker-compose.yaml	Adds a local dev stack for frontend, backend, and bedrock-prompt-messager.
.gitignore	Ignores additional build artifacts/caches.

Files not reviewed (1)

src/frontend/notifai-uploader/package-lock.json: Language not supported

Comments suppressed due to low confidence (1)

src/backend/bedrock_evaluations_runner/tests/test_bedrock_evaluation_service.py:78

run_evaluation_job no longer accepts an evaluator_model argument, but this test still passes evaluator_model="eval-model", which will raise TypeError and won’t exercise the intended failure path. Update the test to call the new signature and to trigger failure via the new code path (e.g., mock get_object/converse/put_object to raise).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T10:18:17Z

src/frontend/notifai-uploader/src/components/FileUpload.js

      const formData = new FormData();
      formData.append("file", file);



formData is created and file is appended, but the data is never sent anywhere now that the /convert request was removed. This is dead code and (together with the now-unused API client) can trigger lint failures; please remove it or reintroduce the upload call.

Copilot · 2026-03-03T10:18:18Z

src/backend/bedrock_evaluations_runner/bedrock_evaluation_service.py

+    def __init__(self, region: str, resource_prefix: str):
+        if not all([region]):
            raise ValueError("Region, Role ARN and resource prefix must be provided.")



The constructor now only validates region, but the error message still mentions “Role ARN and resource prefix” and resource_prefix can be None without failing fast. Please update the validation to check resource_prefix too and correct the error message to match the required args.

Copilot · 2026-03-03T10:18:18Z

src/backend/bedrock_evaluations_runner/bedrock_evaluation_service.py

+            output = io.BytesIO()
+            jsonlines.Writer(output).write_all(responses)
+            s3_out_uri = up.urlparse(output_s3_uri)
+
+            return self.s3_client.put_object(
+                Bucket=s3_out_uri.hostname,
+                Key=s3_out_uri.path[1:],
+                Body=output.getvalue(),
            )


put_object uses Key=s3_out_uri.path[1:], so if output_s3_uri is a prefix like s3://bucket/results/ (as in the env example), this will write to the key results/ (no filename/extension) and will overwrite the same object every run. It also won’t match the alerts job which searches for keys ending with _output.jsonl. Consider appending a generated filename (e.g., ${job_name}_output.jsonl) under the provided prefix.

Copilot · 2026-03-03T10:18:18Z

src/backend/bedrock_evaluations_runner/bedrock_evaluation_service.py

+            output = io.BytesIO()
+            jsonlines.Writer(output).write_all(responses)
+            s3_out_uri = up.urlparse(output_s3_uri)


jsonlines.Writer(output) is given an io.BytesIO(). jsonlines.Writer typically expects a text stream (e.g., io.StringIO() or a TextIOWrapper), which can raise TypeError or write incorrect data. Consider using io.StringIO() (then .getvalue().encode('utf-8')) or wrapping BytesIO with io.TextIOWrapper.

Copilot · 2026-03-03T10:18:18Z

src/backend/bedrock-prompt-messager/tests/services/test_bedrock_service.py

+        with patch.object(bedrock_service, "log_prompt_details_to_s3") as mock_log:
+            result = bedrock_service.call_admail_bedrock_prompt(
+                f"data:fake/type;base64,{base64.b64encode(b"test letter")}", "test_file.json"
+            )
+            assert result["statusCode"] == 400


Same issue here: the f-string has nested double-quotes and will not parse, and the base64 payload should be decoded to a text string when constructing a data URL. Please fix this test input construction (e.g., use single quotes and .decode('utf-8')).

Copilot · 2026-03-03T10:18:20Z

src/backend/app/Dockerfile

@@ -12,4 +10,4 @@ COPY . .

 EXPOSE 8080


The container runs fastapi run /app/main.py, but the code imports using from app... (expecting app to be a package on PYTHONPATH, as configured in pytest.ini). In this image, /app is the working directory, so app typically won’t be importable unless PYTHONPATH includes the parent directory. Consider switching to module syntax (fastapi run app.main:app) and/or setting ENV PYTHONPATH=/ (or restructuring the container paths) so imports resolve consistently.

Suggested change

EXPOSE 8080

EXPOSE 8080

ENV PYTHONPATH=/

Copilot · 2026-03-03T10:18:21Z

infrastructure/terraform/components/notifyai/lambda.tf

@@ -67,7 +67,7 @@ data "aws_iam_policy_document" "bedrock_access" {
      "${aws_s3_bucket.lambda_prompt_logging_s3_bucket.arn}/${local.s3_lambda_logging_key}*",
      "arn:aws:bedrock:${var.region}:${var.aws_account_id}:guardrail/*",
      "arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-pro-v1:*",


IAM policy still grants access to the Nova Pro inference profile (...inference-profile/eu.amazon.nova-pro-v1:*) while the PR switches the deployed model to Nova Lite. If the runtime uses inference profiles, this mismatch can cause AccessDenied when invoking Nova Lite. Consider updating/removing the Nova Pro inference-profile ARN and adding the corresponding Nova Lite inference-profile ARN.

Suggested change

"arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-pro-v1:*",

"arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-lite-v1:*",

Copilot · 2026-03-03T10:18:21Z

infrastructure/terraform/components/notifyai/eventbridge.tf

      role_arn            = aws_iam_role.eventbridge_scheduler_role.arn
+
+      end_date = "2025-12-16T00:00:00Z"
    }


Hard-coding end_date = "2025-12-16T00:00:00Z" permanently stops the scheduled evaluation after that date (and it’s already in the past as of now), regardless of evaluation-schedule-days. If the intent is to disable scheduling, it’s clearer/safer to disable the schedule explicitly (e.g., via a variable/flag) or remove the schedule block, rather than relying on a fixed timestamp.

Copilot · 2026-03-03T10:18:21Z

tools/prompt-input-generator/prompt-input-generator.py

+documents = [
+    obj['Key'] for obj in S3.list_objects_v2(Bucket=INPUT_BUCKET)['Contents']
+    if obj['Key'].endswith(("docx","csv","html","txt","pdf","md"," oc","xlsx","xls"))


list_objects_v2(...)["Contents"] will raise KeyError when the bucket/prefix is empty (AWS omits Contents). Consider using .get("Contents", []) (and optionally handling pagination via IsTruncated/ContinuationToken) so the script behaves predictably for empty/large buckets.

Copilot · 2026-03-03T10:18:21Z

docker-compose.yaml

+    entrypoint:
+      [
+        "fastapi",
+        "dev",
+        "/app/main.py",
+        "--port",
+        "8080",
+        "--host",
+        "0.0.0.0",
+        "--reload",
+      ]


The backend is started with fastapi dev /app/main.py, but main.py imports modules as from app.... In this container layout, that typically requires PYTHONPATH to include the parent directory of the app package (mirroring pytest.ini: pythonpath = src/backend). Without setting PYTHONPATH (or switching to module-style startup like fastapi dev app.main:app), the backend may fail to start with ModuleNotFoundError: No module named 'app'.

github-advanced-security bot found potential problems Dec 16, 2025

View reviewed changes

src/backend/bedrock-prompt-messager/services/bedrock_service.py Fixed Show fixed Hide fixed

src/backend/bedrock-prompt-messager/api_wrapper.py Fixed Show fixed Hide fixed

src/backend/bedrock-prompt-messager/api_wrapper.py Fixed Show fixed Hide fixed

joeds13 and others added 10 commits December 16, 2025 11:12

chore: enable local dev with a docker compose based stack

37dd81b

update app to send document as bytes instead of doing text conversion

0961f90

updated system prompt

5e63b2c

fix bedrock service tests

aae4b06

fix markdown rendering issue on the client with the updated prompt

30825d3

update evaluation mechanisms for multimodal prompting

2fd9889

Remove pandoc and pypdf as no longer required

31409c9

switch default model to nova-lite

d763608

ignore terraform plugin-cache and built evaluation_alerts.zip

e59d873

fix mixed http/https issue in the frontend when retriving history

c87b8ef

PaulTaylor force-pushed the feat/performance-improvements branch from 35c2c4a to c87b8ef Compare December 16, 2025 11:14

PaulTaylor added 2 commits December 16, 2025 11:31

resolve CodeQL issues

e942808

fix whitespace issues

5a6bedc

PaulTaylor force-pushed the feat/performance-improvements branch from 3d49bd0 to 5a6bedc Compare December 16, 2025 11:36

PaulTaylor added 5 commits December 16, 2025 12:08

frontend dependency updates to address vulnerabilities

7b3dd0e

fix unit tests for evaluation service

f195466

exclude local docker runner from sonar coverage analysis

35866ef

improve test coverage in bedrock_service.py

42a43e0

exclude local development wrapper from sonor analysis

4729bdc

PaulTaylor force-pushed the feat/performance-improvements branch from 2a9b3c2 to 4729bdc Compare December 16, 2025 15:07

PaulTaylor marked this pull request as ready for review December 16, 2025 15:43

PaulTaylor requested a review from a team as a code owner December 16, 2025 15:43

disable automated evaluations as they always give the same result

e26288b

PaulTaylor force-pushed the feat/performance-improvements branch from 3a365b2 to e26288b Compare December 16, 2025 16:03

remove some stray print statements remaining from debugging

baff263

PaulTaylor force-pushed the feat/performance-improvements branch from 370b602 to baff263 Compare December 17, 2025 08:41

fixes from user feedback

01de1cf

* Reformatted advice section to remove brackets * Removed content improvement and design best practice advice - can be reenabled by using system_prompt_extended.txt * Switched to using British English spellings

timireland reviewed Feb 27, 2026

View reviewed changes

docker-compose.yaml Outdated Show resolved Hide resolved

docker-compose.yaml Outdated Show resolved Hide resolved

docker-compose.yaml Outdated Show resolved Hide resolved

remove hardcoded aws references from compose file

ea58e17

timireland requested a review from Copilot March 3, 2026 10:06

Copilot started reviewing on behalf of timireland March 3, 2026 10:06 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

		const formData = new FormData();
		formData.append("file", file);

	"arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-pro-v1:*",
	"arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-lite-v1:*",

Conversation

PaulTaylor commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Context

Type of changes

Checklist

Sensitive Information Declaration

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PaulTaylor commented Dec 16, 2025 •

edited

Loading