Skip to content

NotifAI Performance Improvements#91

Open
PaulTaylor wants to merge 21 commits intomainfrom
feat/performance-improvements
Open

NotifAI Performance Improvements#91
PaulTaylor wants to merge 21 commits intomainfrom
feat/performance-improvements

Conversation

@PaulTaylor
Copy link

@PaulTaylor PaulTaylor commented Dec 16, 2025

Description

Improve performance of the model pipeline to more accurately classify documents as admail/business.

  • Add a docker compose stack for local development
  • Update the system prompt
  • Remove text extraction and use multi-modality in llm instead
  • Switch the default model to Nova-Lite
  • Update evaluation framework to account for new pipeline
  • Update dependencies to fix vulnerabilities (for packages used in our code, there are some dependabot issues that may remain in the wider template (eg. in docs))
  • Disable the schedule for the automated evaluation as it always returns the same result from the same model/docs

Context

The current model and prompt are generating false-negatives for letters that should be classified as admail. Misclassifying letters that are appropriate for admail as business incurs unnecessary postage costs.

Type of changes

  • Refactoring (non-breaking change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would change existing functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I am familiar with the contributing guidelines
  • I have followed the code style of the project
  • I have added tests to cover my changes
  • I have updated the documentation accordingly
  • This PR is a result of pair or mob programming

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

  • I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

@PaulTaylor PaulTaylor force-pushed the feat/performance-improvements branch from 35c2c4a to c87b8ef Compare December 16, 2025 11:14
@PaulTaylor PaulTaylor force-pushed the feat/performance-improvements branch from 3d49bd0 to 5a6bedc Compare December 16, 2025 11:36
@PaulTaylor PaulTaylor force-pushed the feat/performance-improvements branch from 2a9b3c2 to 4729bdc Compare December 16, 2025 15:07
@PaulTaylor PaulTaylor marked this pull request as ready for review December 16, 2025 15:43
@PaulTaylor PaulTaylor requested a review from a team as a code owner December 16, 2025 15:43
@PaulTaylor PaulTaylor force-pushed the feat/performance-improvements branch from 3a365b2 to e26288b Compare December 16, 2025 16:03
@PaulTaylor PaulTaylor force-pushed the feat/performance-improvements branch from 370b602 to baff263 Compare December 17, 2025 08:41
* Reformatted advice section to remove brackets
* Removed content improvement and design best practice advice - can be reenabled by using system_prompt_extended.txt
* Switched to using British English spellings
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the NotifAI document-classification pipeline to improve performance and accuracy by switching to a multimodal Bedrock flow (sending documents directly to the LLM), updating prompts/models, refreshing the evaluation/alerting pipeline, and adding local Docker tooling.

Changes:

  • Replace text extraction with multimodal document input (frontend + Bedrock prompt service).
  • Rework automated evaluation runner to run direct converse calls and adjust alert scoring to match the new output format.
  • Add/adjust local Docker + dependency hardening (nginx health endpoint, npm overrides, etc.), and update infra to Nova-Lite + scheduling changes.

Reviewed changes

Copilot reviewed 41 out of 45 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
tools/prompt-input-generator/prompt-input-generator.py Generates JSONL prompts from S3 documents for the new multimodal evaluation format.
tools/prompt-input-generator/env.example Updates generator env vars to S3-based inputs and Nova-Lite defaults.
src/frontend/notifai-uploader/src/components/FileUpload.js Switches upload flow to DataURL (base64) for multimodal input instead of /convert.
src/frontend/notifai-uploader/src/components/AIfeedback.js Tweaks markdown/text cleanup for LLM feedback rendering.
src/frontend/notifai-uploader/src/api/BackendAPIClient.js Adjusts baseURL construction for backend API config.
src/frontend/notifai-uploader/src/Pages/FileUploadPage.js Removes page-count handling and wires new upload payload through to the LLM call.
src/frontend/notifai-uploader/package.json Adds dependency overrides for security/vulnerability mitigation.
src/frontend/notifai-uploader/package-lock.json Locks updated dependency graph consistent with overrides.
src/frontend/notifai-uploader/nginx.conf Adds a /health endpoint for container health checks.
src/frontend/notifai-uploader/Dockerfile Updates nginx base and config placement; adds apk upgrade.
src/frontend/notifai-uploader/.dockerignore Introduces docker ignore rules for frontend build context.
src/backend/bedrock_evaluations_runner/tests/test_bedrock_evaluation_service.py Updates evaluation runner tests for the new programmatic evaluation path.
src/backend/bedrock_evaluations_runner/tests/evaluation_example.json Adds a sample JSONL evaluation record using multimodal document input.
src/backend/bedrock_evaluations_runner/constants.py Centralizes tool schema constants and common error messages.
src/backend/bedrock_evaluations_runner/bedrock_evaluation_service.py Replaces Bedrock “evaluation job” API usage with direct converse execution over JSONL prompts.
src/backend/bedrock_evaluations_runner/bedrock_evaluation_lambda.py Updates Lambda handler to use new evaluation runner signature and flow.
src/backend/bedrock_evaluations_runner/.env-example Updates default model identifiers and region example values.
src/backend/bedrock_alerts/tests/test_evaluations_alert_service.py Updates alert scoring tests to match new result record format.
src/backend/bedrock_alerts/evaluations_alert_service.py Changes scoring logic from Bedrock eval-job metric output to rating vs actualClass.
src/backend/bedrock_alerts/evaluations_alert_lambda.py Adds logging and adjusts control flow for alert triggering.
src/backend/bedrock-prompt-messager/tests/services/test_bedrock_service.py Extends tests to cover multimodal document inputs and mime handling.
src/backend/bedrock-prompt-messager/system_prompt.txt Updates the system prompt for improved NHS-specific classification and effectiveness advice.
src/backend/bedrock-prompt-messager/services/bedrock_service.py Implements Data URL parsing and sends document bytes via Bedrock multimodal document content.
src/backend/bedrock-prompt-messager/api_wrapper.py Adds a Flask wrapper for local HTTP testing of the Lambda handler.
src/backend/bedrock-prompt-messager/Dockerfile Adds a containerized local runner exposing the Flask wrapper.
src/backend/app/tests/services/test_convert_service.py Removes tests for the deprecated /convert text-extraction pipeline.
src/backend/app/tests/routers/test_convert_router.py Removes /convert router tests.
src/backend/app/services/convert_service.py Removes the convert service (pandoc/pdf extraction) implementation.
src/backend/app/routers/convert.py Removes the /convert endpoint router.
src/backend/app/requirements.txt Drops pandoc/pypdf and updates boto3 dependency flavor.
src/backend/app/main.py Expands allowed CORS origins for local dev and adds /health.
src/backend/app/core/auth.py Exempts /health from authentication middleware.
src/backend/app/Dockerfile Updates FastAPI CLI invocation for container startup.
scripts/config/sonar-scanner.properties Adjusts sonar exclusions/coverage exclusions for new local wrapper artifacts.
infrastructure/terraform/etc/group_nhs-notify-poc001.tfvars Updates log retention setting.
infrastructure/terraform/etc/env_eu-west-2_test.tfvars Switches prompt/eval models to Nova-Lite.
infrastructure/terraform/etc/env_eu-west-2_dev2.tfvars Switches prompt/eval models to Nova-Lite.
infrastructure/terraform/etc/env_eu-west-2_dev1.tfvars Switches prompt/eval models to Nova-Lite.
infrastructure/terraform/components/notifyai/resources/prompt-data/letter-responses.json Removes legacy ground-truth response mapping used by the old pipeline.
infrastructure/terraform/components/notifyai/lambda.tf Updates Bedrock model permissions toward Nova-Lite.
infrastructure/terraform/components/notifyai/eventbridge.tf Alters scheduled evaluation behavior via an explicit scheduler end date.
infrastructure/terraform/components/notifyai/bedrock.tf Updates Bedrock access policy resources to Nova-Lite inference profile.
docker-compose.yaml Adds a local dev stack for frontend, backend, and bedrock-prompt-messager.
.gitignore Ignores additional build artifacts/caches.
Files not reviewed (1)
  • src/frontend/notifai-uploader/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)

src/backend/bedrock_evaluations_runner/tests/test_bedrock_evaluation_service.py:78

  • run_evaluation_job no longer accepts an evaluator_model argument, but this test still passes evaluator_model="eval-model", which will raise TypeError and won’t exercise the intended failure path. Update the test to call the new signature and to trigger failure via the new code path (e.g., mock get_object/converse/put_object to raise).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 34 to 36
const formData = new FormData();
formData.append("file", file);

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formData is created and file is appended, but the data is never sent anywhere now that the /convert request was removed. This is dead code and (together with the now-unused API client) can trigger lint failures; please remove it or reintroduce the upload call.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to 18
def __init__(self, region: str, resource_prefix: str):
if not all([region]):
raise ValueError("Region, Role ARN and resource prefix must be provided.")

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor now only validates region, but the error message still mentions “Role ARN and resource prefix” and resource_prefix can be None without failing fast. Please update the validation to check resource_prefix too and correct the error message to match the required args.

Copilot uses AI. Check for mistakes.
Comment on lines +128 to 136
output = io.BytesIO()
jsonlines.Writer(output).write_all(responses)
s3_out_uri = up.urlparse(output_s3_uri)

return self.s3_client.put_object(
Bucket=s3_out_uri.hostname,
Key=s3_out_uri.path[1:],
Body=output.getvalue(),
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put_object uses Key=s3_out_uri.path[1:], so if output_s3_uri is a prefix like s3://bucket/results/ (as in the env example), this will write to the key results/ (no filename/extension) and will overwrite the same object every run. It also won’t match the alerts job which searches for keys ending with _output.jsonl. Consider appending a generated filename (e.g., ${job_name}_output.jsonl) under the provided prefix.

Copilot uses AI. Check for mistakes.
Comment on lines +128 to +130
output = io.BytesIO()
jsonlines.Writer(output).write_all(responses)
s3_out_uri = up.urlparse(output_s3_uri)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonlines.Writer(output) is given an io.BytesIO(). jsonlines.Writer typically expects a text stream (e.g., io.StringIO() or a TextIOWrapper), which can raise TypeError or write incorrect data. Consider using io.StringIO() (then .getvalue().encode('utf-8')) or wrapping BytesIO with io.TextIOWrapper.

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +166
with patch.object(bedrock_service, "log_prompt_details_to_s3") as mock_log:
result = bedrock_service.call_admail_bedrock_prompt(
f"data:fake/type;base64,{base64.b64encode(b"test letter")}", "test_file.json"
)
assert result["statusCode"] == 400
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: the f-string has nested double-quotes and will not parse, and the base64 payload should be decoded to a text string when constructing a data URL. Please fix this test input construction (e.g., use single quotes and .decode('utf-8')).

Copilot uses AI. Check for mistakes.
@@ -12,4 +10,4 @@ COPY . .

EXPOSE 8080
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container runs fastapi run /app/main.py, but the code imports using from app... (expecting app to be a package on PYTHONPATH, as configured in pytest.ini). In this image, /app is the working directory, so app typically won’t be importable unless PYTHONPATH includes the parent directory. Consider switching to module syntax (fastapi run app.main:app) and/or setting ENV PYTHONPATH=/ (or restructuring the container paths) so imports resolve consistently.

Suggested change
EXPOSE 8080
EXPOSE 8080
ENV PYTHONPATH=/

Copilot uses AI. Check for mistakes.
@@ -67,7 +67,7 @@ data "aws_iam_policy_document" "bedrock_access" {
"${aws_s3_bucket.lambda_prompt_logging_s3_bucket.arn}/${local.s3_lambda_logging_key}*",
"arn:aws:bedrock:${var.region}:${var.aws_account_id}:guardrail/*",
"arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-pro-v1:*",
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IAM policy still grants access to the Nova Pro inference profile (...inference-profile/eu.amazon.nova-pro-v1:*) while the PR switches the deployed model to Nova Lite. If the runtime uses inference profiles, this mismatch can cause AccessDenied when invoking Nova Lite. Consider updating/removing the Nova Pro inference-profile ARN and adding the corresponding Nova Lite inference-profile ARN.

Suggested change
"arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-pro-v1:*",
"arn:aws:bedrock:${var.region}:${var.aws_account_id}:inference-profile/eu.amazon.nova-lite-v1:*",

Copilot uses AI. Check for mistakes.
Comment on lines 14 to 17
role_arn = aws_iam_role.eventbridge_scheduler_role.arn

end_date = "2025-12-16T00:00:00Z"
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding end_date = "2025-12-16T00:00:00Z" permanently stops the scheduled evaluation after that date (and it’s already in the past as of now), regardless of evaluation-schedule-days. If the intent is to disable scheduling, it’s clearer/safer to disable the schedule explicitly (e.g., via a variable/flag) or remove the schedule block, rather than relying on a fixed timestamp.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +39
documents = [
obj['Key'] for obj in S3.list_objects_v2(Bucket=INPUT_BUCKET)['Contents']
if obj['Key'].endswith(("docx","csv","html","txt","pdf","md"," oc","xlsx","xls"))
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_objects_v2(...)["Contents"] will raise KeyError when the bucket/prefix is empty (AWS omits Contents). Consider using .get("Contents", []) (and optionally handling pagination via IsTruncated/ContinuationToken) so the script behaves predictably for empty/large buckets.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +45
entrypoint:
[
"fastapi",
"dev",
"/app/main.py",
"--port",
"8080",
"--host",
"0.0.0.0",
"--reload",
]
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend is started with fastapi dev /app/main.py, but main.py imports modules as from app.... In this container layout, that typically requires PYTHONPATH to include the parent directory of the app package (mirroring pytest.ini: pythonpath = src/backend). Without setting PYTHONPATH (or switching to module-style startup like fastapi dev app.main:app), the backend may fail to start with ModuleNotFoundError: No module named 'app'.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants