Skip to content

typing: graduate opencontractserver.tasks.* from mypy baseline#1541

Open
JSv4 wants to merge 10 commits intomainfrom
claude/fix-issue-1482-8mA5R
Open

typing: graduate opencontractserver.tasks.* from mypy baseline#1541
JSv4 wants to merge 10 commits intomainfrom
claude/fix-issue-1482-8mA5R

Conversation

@JSv4
Copy link
Copy Markdown
Collaborator

@JSv4 JSv4 commented May 5, 2026

Closes #1482. Continues draining the mypy baseline established in #1331 (tracker: #1447).

Summary

Graduates all sixteen opencontractserver.tasks.* modules out of the mypy baseline. The package is now type-clean.

Removed from mypy.ini — sixteen [mypy-…] blocks:
agent_tasks, analyzer_tasks, badge_tasks, corpus_tasks, data_extract_tasks, doc_analysis_tasks, doc_tasks, embeddings_task, export_tasks, export_tasks_v2, extract_orchestrator_tasks, fork_tasks, import_tasks, import_tasks_v2, lookup_tasks, memory_tasks.

Pruned from docs/typing/mypy_baseline.txt — 227 lines (6813 → 6586).

Notable per-file fixes

  • badge_tasks: dropped the User = get_user_model() runtime-call-as-type pattern in favour of a direct from opencontractserver.users.models import User (mirrors the conftest.py pattern used elsewhere in the project).
  • doc_analysis_tasks: filter Anthropic response.content blocks via hasattr(resp, "text") so list comprehensions over the union type are legal; renamed the accumulator to avoid str | list shadowing.
  • analyzer_tasks: tightened request_gremlin_manifest signature to int and coerced add(*doc_ids) to ints since analyzed_documents rejects str.
  • data_extract_tasks: scoped # type: ignore[valid-type] on the output_type = list[output_type] runtime-parametric pattern; scoped # type: ignore[typeddict-item] on two TypedDict(**dict) expansions that mypy cannot prove cover all required keys.
  • doc_tasks: validate FieldFile.name before Storage.open; thread parent_id through the FUNSD annotation dict so it satisfies the TypedDict; cast page_data.bounds to BoundingBoxPythonType; widen annotation_map keys to int | str to match the declared return type; coerce file_type to FileTypeEnum before get_components_by_mimetype.
  • embeddings_task: cast _create_embedding_for_annotation to the HasEmbeddingMixin-typed callable signature expected by _apply_dual_embedding_strategy; explicit result: dict[str, Any] annotation so subscripted update / append cooperate with the mixed-shape result dict.
  • export_tasks / export_tasks_v2: handle None returns from package_corpus_for_export / package_label_set_for_export with RuntimeError before constructing the TypedDict; replace output_bytes (BytesIO) with ContentFile when calling FieldFile.save; rename inner FUNSD page-data variable to keep its tuple[int, str, str] type distinct from the bytes payload; fix UserExport.error typo (the actual field is errors).
  • import_tasks / import_tasks_v2: swap AbstractBaseUser for the project's concrete User (same pattern as corpuses/folder_service); narrow the V2-only branch so data_json.get(...) returns V2-typed values instead of the V1∩V2 object lower bound; widen aggregator dict to dict[str | int, int] to match import_doc_annotations's return type.
  • fork_tasks: replace implicit Optional[list[str]] = None defaults; guard FieldFile.name accesses; annotate empty folder_map / structural_set_map accumulators.
  • corpus_tasks: drop a stale return summary from a -> None task; coerce the str | int Celery args to int once at function entry; rename inner execution variable to avoid shadowing the outer loop binding; scoped # type: ignore[attr-defined] on Celery's dynamically-attached Signature.s / .si attributes.
  • extract_orchestrator_tasks: early-return when extract_id is None; scoped # type: ignore[attr-defined] on task_func.si.

Cross-cutting type tightening

  • LabelLookupPythonType.{text_labels, doc_labels} narrowed from dict[str | int, AnnotationLabelPythonType] to dict[str, ...] (the producer in utils.etl.build_label_lookups already keys by str(pk)).
  • build_label_lookups(corpus_id: ...) changed from str to int to match every call site.
  • utils.importing.load_or_create_labels and utils.importing.import_doc_annotations now accept Mapping instead of dict, so callers can pass OpenContractDocExport / AnnotationLabelPythonType TypedDicts directly without cast().
  • import_tasks._validate_sidecar_schema and _apply_sidecar_annotations accept str | None for sidecar_path since callers may pass a fallback that is None.

Verification

  • python -m mypy --config-file mypy.ini opencontractserver configSuccess: no issues found in 1032 source files.
  • All 16 graduated modules import cleanly under config.settings.mypy.

Test plan

  • CI passes (mypy / lint / backend tests).
  • Spot-check that the baseline file no longer contains any opencontractserver/tasks/ lines: grep -c "opencontractserver/tasks/" docs/typing/mypy_baseline.txt0.
  • Spot-check mypy.ini no longer contains any [mypy-opencontractserver.tasks.*] blocks.

https://claude.ai/code/session_01Mze1ELoLJu4feUTYWFq7Wt


Generated by Claude Code

…#1482)

Continues draining the mypy baseline established in #1331. Graduates all
sixteen task modules out of the baseline.

Removed from mypy.ini
- opencontractserver.tasks.{agent_tasks, analyzer_tasks, badge_tasks,
  corpus_tasks, data_extract_tasks, doc_analysis_tasks, doc_tasks,
  embeddings_task, export_tasks, export_tasks_v2,
  extract_orchestrator_tasks, fork_tasks, import_tasks, import_tasks_v2,
  lookup_tasks, memory_tasks}

Pruned from docs/typing/mypy_baseline.txt
- 227 lines (6813 -> 6586)

Highlights of the per-file fixes
- badge_tasks: dropped the `User = get_user_model()` runtime-call-as-type
  pattern in favour of a direct ``from opencontractserver.users.models
  import User`` (mirrors the conftest pattern used elsewhere).
- doc_analysis_tasks: filter Anthropic ``response.content`` blocks via
  ``hasattr(resp, "text")`` so list comprehensions on the union type are
  legal; renamed the accumulator to avoid the ``str | list`` shadowing.
- analyzer_tasks: tightened ``request_gremlin_manifest`` signature to
  ``int`` and coerce ``add(*doc_ids)`` to ints since ``analyzed_documents``
  rejects str.
- data_extract_tasks: scoped ``# type: ignore[valid-type]`` on the
  ``output_type = list[output_type]`` runtime-parametric pattern; scoped
  ``# type: ignore[typeddict-item]`` on two ``TypedDict(**dict)``
  expansions that mypy cannot prove cover all required keys.
- doc_tasks: validate FieldFile.name before passing to Storage.open;
  threaded the ``parent_id`` field through the FUNSD annotation dict so
  it satisfies the TypedDict; cast ``page_data.bounds`` to
  BoundingBoxPythonType where the dataclass field is widened to
  ``dict[str, int | float]``; widened ``annotation_map`` keys to
  ``int | str`` to match the function's declared return type; coerced
  ``file_type`` to FileTypeEnum before calling
  ``get_components_by_mimetype``.
- embeddings_task: cast ``_create_embedding_for_annotation`` to the
  ``HasEmbeddingMixin``-typed callable signature expected by
  ``_apply_dual_embedding_strategy``; explicit annotation
  ``result: dict[str, Any]`` so the mixed-shape result dict cooperates
  with subscripted update / append.
- export_tasks(_v2): handled None returns from
  ``package_corpus_for_export`` / ``package_label_set_for_export`` by
  raising RuntimeError before constructing the TypedDict; replaced
  ``output_bytes`` (BytesIO) with ContentFile when calling
  FieldFile.save; renamed the inner FUNSD page-data variable to keep its
  ``tuple[int, str, str]`` type distinct from the bytes payload; fixed
  the ``UserExport.error`` typo (the actual field is ``errors``).
- import_tasks(_v2): swapped ``AbstractBaseUser`` for the project's
  concrete ``User`` (same pattern as corpuses/folder_service); tightened
  the V2-only branch so ``data_json.get(...)`` returns the V2-typed
  values instead of the V1∩V2 ``object`` lower bound; widened
  aggregator dict to ``dict[str | int, int]`` to match
  ``import_doc_annotations``'s declared return type.
- fork_tasks: replaced implicit ``Optional[list[str]] = None`` parameter
  defaults; guarded FieldFile.name accesses; annotated the empty
  ``folder_map`` / ``structural_set_map`` accumulators.
- corpus_tasks: dropped a stale ``return summary`` from the
  ``-> None`` task; coerced the ``str | int`` celery args to ``int``
  once at function entry; renamed the inner ``execution`` loop variable
  to avoid shadowing the outer ``for doc_id, execution in
  execution_map.items()`` binding; scoped ``# type: ignore[attr-defined]``
  on Celery's dynamically-attached ``Signature.s`` / ``.si`` attributes.
- extract_orchestrator_tasks: early-return when ``extract_id`` is None;
  scoped ``# type: ignore[attr-defined]`` on ``task_func.si``.

Cross-cutting type tightening
- ``LabelLookupPythonType.{text_labels, doc_labels}`` narrowed from
  ``dict[str | int, AnnotationLabelPythonType]`` to ``dict[str, ...]``
  (the producer in ``utils.etl.build_label_lookups`` already keys by
  ``str(pk)``). Updated the ``corpus_id`` parameter on
  ``build_label_lookups`` from ``str`` to ``int`` to match every call
  site.
- ``utils.importing.load_or_create_labels`` and
  ``utils.importing.import_doc_annotations`` now accept ``Mapping``
  instead of ``dict``, so callers can pass ``OpenContractDocExport`` /
  ``AnnotationLabelPythonType`` TypedDicts directly without ``cast()``.
- ``import_tasks._validate_sidecar_schema`` and ``_apply_sidecar_annotations``
  accept ``str | None`` for the ``sidecar_path`` argument since the
  caller passes a fallback name that can be None.

Verification
- python -m mypy --config-file mypy.ini opencontractserver config
  -> "Success: no issues found in 1032 source files"
- All 16 graduated modules import cleanly under config.settings.mypy.
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #1541 — Graduate opencontractserver.tasks.* from mypy baseline

Overall this is a high-value PR. Turning off 16 ignore_errors = True blocks in one pass is significant, and the typing exercise uncovered several genuine runtime bugs. The description is unusually thorough, which is appreciated.


Real Bugs Fixed (strong positive)

These are not just type annotation housekeeping — they are actual runtime defects that were hiding behind the baseline:

  • agent_tasks.py: async for event in agent.stream(...)async for event in await agent.stream(...). Missing await on a coroutine returning an async generator is a real async bug.
  • export_tasks_v2.py: export.error = Trueexport.errors = str(e). Wrong attribute name silently swallowed export failure state.
  • export_tasks.py: FieldFile.save(..., BytesIO)FieldFile.save(..., ContentFile(...)). Django's FieldFile.save needs a File-like object, not a raw BytesIO; this could silently corrupt exports depending on the storage backend.
  • doc_analysis_tasks.py: Filtering response.content blocks via hasattr(resp, "text") correctly handles ToolUseBlock items in the Anthropic SDK union, which would otherwise raise AttributeError.
  • doc_tasks.py: Early guard on FieldFile.name before Storage.open turns a cryptic "str | None is not str" crash into a clear ValueError.
  • analyzer_tasks.py: Coercing doc_ids to int before .add() fixes a real ORM mismatch.
  • fork_tasks.py / corpus_tasks.py: Explicit Optional[list[str]] = None and int coercions eliminate silent type widening at Celery boundaries.

Potential Regression: FUNSD Export Key Type

This is the one item I'd want resolved before merge.

In doc_tasks.py, package_doc_for_funsd_export, the key used to be:

page = str(page_data.page_index)   # string key
annotation_map[page] = ...

This PR changes it to:

page = page_data.page_index        # int key
annotation_map[page] = ...

annotation_map is returned and consumed in export_tasks.py:package_funsd_exports, which still does:

if str(index) in funsd_annotations:   # unchanged — looks up string keys

str(0) in {0: [...]} evaluates to False, so FUNSD annotations will be silently omitted from every export after this change. The fix is either:

  • Change export_tasks.py to if index in funsd_annotations:, or
  • Keep string keys and update the dict annotation to dict[str, ...]

Neither direction is wrong; the current state leaves both sides inconsistent.


Minor / Follow-up Items

cast(V2Type, ...) then checking if corpus_export is None (export_tasks_v2.py):

corpus_export = cast(OpenContractCorpusV2Type, package_corpus_for_export(...))
...
if corpus_export is None or label_set_export is None:
    raise RuntimeError(...)

After cast, mypy considers corpus_export non-optional, so the None branch is dead code from a type perspective. The check is still useful defensively but consider reversing the order: check for None first, then cast the non-None value. That way the cast actually reflects what the type system can prove.

# type: ignore[arg-type] on temperature (data_extract_tasks.py line ~444):

temperature=extract_temperature,  # type: ignore[arg-type]

The PR description doesn't mention this one. What is the actual mismatch — is extract_temperature float | None and the parameter float? A narrow guard (if extract_temperature is not None) would be cleaner than silencing the error.

memory_tasks.py# type: ignore[arg-type] on values_list(named=True): Reasonable workaround for a known Django stubs limitation. Consider a brief comment so the next reader doesn't wonder.

Pre-existing mutable default in utils/importing.py:

def load_or_create_labels(..., existing_labels: dict[str, AnnotationLabel] = {}):

Not introduced by this PR, but now that this file is touched: {} as a default is a classic Python footgun. Changing to None and guarding with existing_labels or {} inside the function body would be a safe same-line fix.

process_corpus_action return type (corpus_tasks.py): Dropping return summary from a -> None task is correct per the type annotation, but the PR description should call out that any caller relying on the Celery task result dict will now receive None. A quick grep for .get() on the result of this task would confirm no callers are affected.


Typing Approach

The use of scoped # type: ignore[valid-type] and # type: ignore[typeddict-item] (as opposed to module-level suppression) is exactly the right pattern for genuinely unprovable runtime constructions. The cast("list[Any]", tools) with an explanatory comment about list invariance is also correct and well-documented.

The move from AbstractBaseUser to the concrete User model in TYPE_CHECKING blocks (import_tasks.py, import_tasks_v2.py) is consistent with the project's existing pattern (badge_tasks.py, corpuses/folder_service).


Summary

The PR is in good shape. The FUNSD key regression is the one blocking item — everything else is either clearly correct or a low-risk follow-up. Once that lookup inconsistency is resolved, the baseline reduction and the genuine bug fixes make this well worth merging.

JSv4 added 3 commits May 4, 2026 22:26
Fixes:
- Apply pyupgrade rewrites to import_tasks_v2.py
- Fix FUNSD annotation_map key type regression: doc_tasks.py now uses int
  keys consistently, export_tasks.py looks up by int (not str(index)),
  and all return/parameter type annotations updated accordingly
- Narrow dict[str|int,...] to dict[str,...] in etl.build_label_lookups
  and build_document_export (keys are always str(pk))
- Cast get_component_by_name() results in embeddings.py, embeddings_task.py
  and doc_tasks.py to the specific subtype (BaseEmbedder / BaseParser /
  BaseThumbnailGenerator) to satisfy mypy
- Append FunsdAnnotationType directly rather than via dict() to preserve
  the TypedDict type
Tests added:
- test_convert_doc_to_funsd_annotation_map_uses_int_keys: regression guard
  that annotation_map keys are int, not str (FUNSD key-type fix)
- test_convert_doc_to_funsd_empty_doc_returns_empty_map: empty-annotation path
- test_export_tasks_typing_fixes.py (new file):
  - FinalizeExportTestCase: ContentFile save round-trip, seekable-buffer behaviour
  - OnDemandPostProcessorsTestCase: RuntimeError on missing export file,
    no-op when post_processors is empty
  - PackageFunsdExportsIntKeyTestCase: int-key annotation lookup (regression),
    empty-annotation pages, multi-page multi-annotation packaging
  - PackageCorpusExportV2ErrorsFieldTestCase: export.errors field populated
    on failure (was non-existent export.error before the fix)
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review — PR #1541: Graduate opencontractserver.tasks.* from mypy baseline

This is a well-executed typing graduation. The mechanical changes are clean, the # type: ignore annotations are correctly scoped, and the PR description is thorough. Importantly, the process surfaced several genuine pre-existing bugs (not just annotation noise), which are fixed here. A few observations below, roughly ordered by severity.


Genuine Bug Fixes (Good Catches)

1. Silent FUNSD annotation loss (doc_tasks.py / export_tasks.py)

The most impactful fix. doc_tasks.py was building annotation_map with int keys but export_tasks.py looked them up via str(index), so every FUNSD export silently produced empty annotation lists. Both sides now use int keys consistently. The regression tests in test_doc_tasks.py and test_export_tasks_typing_fixes.py cover this well.

2. Wrong field name in V2 export error handling (export_tasks_v2.py line ~267)

# Before
export.error = True
# After
export.errors = str(e)

The field is errors: str, not error: bool. The old code was silently swallowed by Django (unknown attribute assignment on a model instance doesn't raise), leaving exports in an inconsistent state with no error message recorded.

3. M2M add receiving strings (analyzer_tasks.py)

# Before
analysis.analyzed_documents.add(*doc_ids)  # doc_ids: list[int | str]
# After
analysis.analyzed_documents.add(*[int(doc_id) for doc_id in doc_ids])

Celery JSON serialisation widens int to str, so this was a latent runtime bug on every Celery-dispatched call.

4. process_corpus_action returning a value from a -> None task (corpus_tasks.py)

Dropped the stale return summary. Celery ignores task return values by default, so this was harmless, but it's a correctness fix.

5. request_gremlin_manifest not guarding None return (analyzer_tasks.py)

Added return analyzer_manifests or []. The original return analyzer_manifests could return None if the Gremlin request produced no results, breaking callers that iterate the list.


One Type-Correctness Nit

export_tasks_v2.pycast before None check (~line 167)

corpus_export = cast(
    OpenContractCorpusV2Type,
    package_corpus_for_export(corpus, v2_format=True),
)
...
if corpus_export is None or label_set_export is None:
    raise RuntimeError(...)

cast is a runtime no-op so this is safe at runtime, but mypy will now infer corpus_export: OpenContractCorpusV2Type (non-optional) and may flag corpus_export is None as always-false, depending on strictness. Cleaner to check for None first, then cast:

_corpus_export = package_corpus_for_export(corpus, v2_format=True)
_label_set_export = package_label_set_for_export(corpus.label_set)
if _corpus_export is None or _label_set_export is None:
    raise RuntimeError(...)
corpus_export = cast(OpenContractCorpusV2Type, _corpus_export)
label_set_export = _label_set_export

This is minor and does not affect runtime behaviour.


Pre-existing Issue Surfaced (Out of Scope, FYI)

Mutable default argument in utils/importing.py

def load_or_create_labels(
    ...
    existing_labels: dict[str, AnnotationLabel] = {},  # ← shared across calls
) -> dict[str, AnnotationLabel]:

This is a Python anti-pattern where the default {} is shared across all call sites that don't pass the argument explicitly. In practice it won't cause problems here if the dict is never mutated through the default (it's only read), but it's fragile. Not introduced by this PR, just noting it.


Minor Notes

  • badge_tasks.py: Switching from get_user_model() to a direct import of User is consistent with how the rest of the project handles this (e.g., conftest.py, corpuses/folder_service). Not objectionable.
  • data_extract_tasks.py: temperature=extract_temperature # type: ignore[arg-type] — the suppression is scoped, but it would be worth confirming extract_temperature is typed as float | None while the parameter expects float. If so, a simple temperature=extract_temperature or 0.0 (or a proper None guard) might be cleaner than the ignore.
  • doc_tasks.py: The [dict(row) for row in corpus_data] materialisation passed to _create_document_processed_notifications is correct (converts ValuesQuerySet rows to plain dicts as the signature expects).
  • Scoped # type: ignore comments: All ignores are narrow (e.g., [attr-defined], [valid-type], [typeddict-item]). That's the right approach.

Test Coverage

The new test file test_export_tasks_typing_fixes.py is solid — it covers the ContentFile wrapping in finalize_export, the RuntimeError guard in on_demand_post_processors, the int-key regression in package_funsd_exports, and the errors field fix in package_corpus_export_v2. The additions to test_doc_tasks.py round out the FUNSD key-type coverage. Good test quality overall.


Summary: The PR achieves its stated goal cleanly and, more importantly, the typing exercise uncovered and fixed several real production bugs. The nit about cast order in export_tasks_v2.py is the only change I'd consider before merge. Everything else looks correct.

- export_tasks_v2.py: reorder so the None check on package_corpus_for_export
  happens before the cast, allowing mypy to actually narrow the type
  (review #1541 nit). Cast is then a runtime no-op for the narrowed value.
- test_export_tasks_typing_fixes.py: black reformat picked up by CI.
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #1541 — Graduate opencontractserver.tasks.* from mypy baseline

This is a solid, well-scoped typing PR. The diff is large but nearly all of it is legitimate type-annotation work. More importantly, the process uncovered several real production bugs that are fixed along the way. Notes below.


Real Bug Fixes (Excellent — these are the most valuable parts of this PR)

1. Silent export-failure tracking (export_tasks_v2.py line ~266)
The old export.error = True referenced a non-existent model field. Django's dynamic attribute assignment meant this was silently setting a throwaway instance attribute — it was never persisted. The UserExport model has an errors = TextField(...) field, not an error boolean. This fix from export.error = Trueexport.errors = str(e) turns a silent failure-masking bug into a properly persisted error string. The regression test in test_export_tasks_typing_fixes.py correctly validates this.

2. FUNSD annotation key type mismatch (doc_tasks.py / export_tasks.py)
convert_doc_to_funsd was building an annotation map with str keys (e.g. '0'), while package_funsd_exports was looking up by integer index (if index in funsd_annotations). This meant all FUNSD annotations were silently dropped during export — a data-loss bug. The int-key fix plus the regression test in test_doc_tasks.py (checking assertIsInstance(key, int)) correctly pins this.

3. ContentFile wrapping in finalize_export (export_tasks.py line ~88)
Passing raw BytesIO to FieldFile.save() is incorrect — Django's FieldFile.save() expects a File or ContentFile wrapper. Some storage backends (S3, GCS) would silently fail or produce corrupt files. Wrapping with ContentFile(output_bytes.read()) is the right idiom.

4. None-return guard for package_corpus_for_export / package_label_set_for_export
Both functions can return None, and the old code was passing them directly into TypedDict fields. Adding the RuntimeError guard before TypedDict construction is correct — it makes the failure loud and attributable rather than silently corrupting the export payload.


Type System Changes (Good)

  • LabelLookupPythonType narrowing from dict[str | int, …] to dict[str, …] is verified correct — the producer (build_label_lookups) always emits str(pk) keys.
  • Mapping instead of dict for load_or_create_labels / import_doc_annotations is the right Liskov substitution (TypedDicts are Mappings, not dicts).
  • # type: ignore comments are all correctly scoped to Celery's dynamic .s() / .si() / .si() attributes, which genuinely lack stubs. No broad suppression.
  • build_label_lookups(corpus_id: int) correcting from str to int aligns with every call site.

Minor Concerns

assert s3 is not None in package_funsd_exports (export_tasks.py):

assert s3 is not None  # set above when STORAGE_BACKEND == "AWS"

Python assert statements are stripped when running with the -O (optimize) flag. In Celery workers this is unlikely but not impossible. Prefer an explicit guard:

if s3 is None:
    raise RuntimeError("S3 client unavailable for AWS storage backend")

This is a minor nit since Celery workers rarely run with -O, but it's more robust.

Celery serialization comment in corpus_tasks.py (~line 187):

# Coerce ids that may have arrived as strings (Celery JSON serialisation
# widens ``int`` to ``str``) ...

Celery's standard JSON serializer actually preserves int types — it doesn't widen them to str. The coercion is still defensively correct (and harmless), but the comment is misleading. If the values are arriving as strings it's more likely due to the caller constructing them from URL params or form data. Suggest updating the comment to something like: # Guard against callers that pass string IDs (e.g. from Celery task args constructed via URL params).

process_analyzer early None guard (corpus_tasks.py ~line 104):
The new ValueError("process_analyzer requires a non-null analyzer") replaces what would have been an AttributeError on .id. This is strictly better, but worth confirming no existing callsite intentionally passes analyzer=None as a no-op. A quick grep of callers to verify would be prudent.


Test Coverage

The new test_export_tasks_typing_fixes.py file covers exactly the four real bugs found:

  • ContentFile wrapping in finalize_export
  • RuntimeError on missing export.file in on_demand_post_processors
  • Int keys in FUNSD annotation map (via package_funsd_exports)
  • export.errors field (vs. the non-existent export.error) in package_corpus_export_v2

The test_doc_tasks.py additions (int-key regression + empty-doc baseline) are also well-targeted. Good test coverage for a typing PR.


Summary

Approve with minor suggestions. The type graduation is clean, # type: ignore uses are correctly scoped, and the PR surfaced four genuine production bugs (silent failure tracking, FUNSD data loss, BytesIO storage incompatibility, and None propagation into TypedDicts). The two actionable suggestions are: (1) replace assert s3 is not None with an explicit RuntimeError guard, and (2) correct the Celery serialization comment in corpus_tasks.py. Neither is a blocker.

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review — PR #1541: Graduate opencontractserver/tasks/* from mypy baseline

Overview

This PR finishes the typing graduation of all 16 opencontractserver/tasks/* modules. It removes 16 [mypy-…] baseline blocks from mypy.ini and prunes 227 lines from docs/typing/mypy_baseline.txt. Importantly, several changes in this PR fix real runtime bugs, not just appease mypy.


Real Bug Fixes (beyond pure typing)

These are the most important changes in the PR, each backed by a regression test:

agent_tasks.py — missing await

# Before (silently iterates a coroutine, never yields):
async for event in agent.stream(user_message, store_messages=False):
# After (correct):
async for event in await agent.stream(user_message, store_messages=False):

This was a silent failure: iterating a coroutine object directly produces no events. The fix is correct.

export_tasks_v2.py — wrong field name in exception handler

# Before (AttributeError would shadow the original exception):
export.error = True
# After:
export.errors = str(e)

The double-masking bug (wrong field raises AttributeError which eats the original exception) is now correctly handled and tested.

export_tasks.pyBytesIO passed to FieldFile.save

# Before (FieldFile.save requires a File, not a raw BytesIO):
export.file.save(filename, output_bytes)
# After:
export.file.save(filename, ContentFile(output_bytes.read()))

Correct fix. The test test_finalize_export_reads_from_seekable_buffer also verifies the explicit seek(0) contract.

export_tasks.py — FUNSD annotation_map int/str key mismatch

doc_tasks.convert_doc_to_funsd built annotation_map with int page keys. package_funsd_exports looked up with str(index). The result: all FUNSD annotations were silently dropped. Both sides now use int consistently, and test_annotation_on_page_zero_appears_in_zip_with_int_key is a solid regression guard.


Code Quality — Positive Observations

  • Scoped # type: ignore comments ([attr-defined], [valid-type], [typeddict-item]) are precise rather than blanket, which is the right practice. The Celery .s / .si attribute ignores are unavoidable until upstream adds stubs.
  • Mapping over dict for load_or_create_labels and import_doc_annotations is the idiomatic solution for accepting TypedDicts without cast(). Good call.
  • build_label_lookups(corpus_id: int) correction removes a long-standing mismatch between the declared str type and integer call-sites (lookup_tasks, export_tasks_v2).
  • LabelLookupPythonType narrowing from dict[str | int, …] to dict[str, …] is safe: build_label_lookups already keys via str(pk). Narrowing this removes width from every consumer.
  • _validate_sidecar_schema / _apply_sidecar_annotations accepting str | None for sidecar_path matches how callers pass a fallback that may be None.
  • Guard on FieldFile.name before Storage.open (doc_tasks, fork_tasks) prevents a runtime TypeError for documents without an uploaded file.

Minor Suggestions / Observations

1. import_tasks_v2.py — redundant membership test in walrus comprehension

source_ids: list[int] = [
    new_id
    for old_id in rel_data.get("source_annotation_ids", [])
    if str(old_id) in annot_id_map              # ← redundant
    and (new_id := annot_id_map.get(str(old_id))) is not None
]

The str(old_id) in annot_id_map guard is redundant when the walrus expression already checks is not None. A get on a missing key returns None, so the None check is sufficient:

source_ids: list[int] = [
    new_id
    for old_id in rel_data.get("source_annotation_ids", [])
    if (new_id := annot_id_map.get(str(old_id))) is not None
]

Not a correctness issue, but two dict lookups per element is worth cleaning up.

2. data_extract_tasks.py — unaddressed temperature type ignore

temperature=extract_temperature,  # type: ignore[arg-type]

This suppression isn't mentioned in the PR description. If extract_temperature can be None or an unexpected type, this may be masking a real caller bug. Worth a brief comment explaining why the ignore is needed (e.g., "the function accepts float | str but extract_temperature is narrower/wider than the stub").

3. corpus_tasks.py — removed return summary was dead code

The PR correctly removes return summary from a -> None task. Worth noting this was unreachable dead code (Celery never uses the return value of shared_task functions in fire-and-forget mode). No action needed; just calling it out as a positive cleanup.

4. import_tasks.pyTYPE_CHECKING import alias

if TYPE_CHECKING:
    from opencontractserver.users.models import User as AbstractBaseUser

Aliasing User as AbstractBaseUser is slightly confusing since they are distinct abstractions. Naming it User directly (as done in import_tasks_v2.py) would be clearer. Minor nit.


Test Coverage

The new test_export_tasks_typing_fixes.py (326 lines) covers all four primary bug fixes with good edge cases:

  • FinalizeExportTestCase — file save + seek behaviour
  • OnDemandPostProcessorsTestCase — missing file guard + no-op path
  • PackageFunsdExportsIntKeyTestCase — int-key regression, empty annotation, multi-page
  • PackageCorpusExportV2ErrorsFieldTestCaseerrors field on failure

test_doc_tasks.py additions cover the int-key regression and the empty-annotation base case.

One gap: there are no tests for the await agent.stream() fix in agent_tasks.py. The async streaming path is hard to unit-test, but a simple mock that verifies the awaitable is consumed would prevent a future regression.


Summary

This is a well-executed graduation PR. The real bug fixes (missing await, wrong error field, BytesIO / ContentFile, FUNSD key mismatch) make this more valuable than a pure type-annotation pass. The targeted # type: ignore scoping and the Mapping widening are idiomatic. The test coverage is solid for all behaviour changes. The minor items above are suggestions, not blockers.

Verdict: Approve with optional follow-ups on the redundant membership test and the temperature ignore explanation.

JSv4 added 2 commits May 5, 2026 00:37
…ignore, rename UserModel alias

- import_tasks_v2.py: walrus comprehension drops the redundant
  'str(old_id) in annot_id_map' guard. dict.get returns None for unknown
  keys, so the 'is not None' check on the walrus result is sufficient
  (one dict lookup per element instead of two).
- data_extract_tasks.py: explain why temperature=extract_temperature
  carries '# type: ignore[arg-type]'. The downstream stub declares
  temperature: float; we intentionally pass None for Claude models so
  the temperature=0 override engages. Tracked in #1381.
- import_tasks.py: rename TYPE_CHECKING 'User as AbstractBaseUser' to
  'User as UserModel' for consistency with import_tasks_v2.py. The alias
  is required because the runtime 'User = get_user_model()' would
  otherwise collide with the type-only import.
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code Review: PR #1541 — Typing graduation for opencontractserver.tasks.*

This is a clean, well-scoped typing PR. The baseline-drain approach is sound — removing 227 lines from the baseline and all 16 [mypy-…] ignore blocks in one shot is a meaningful checkpoint. Several genuine bugs were caught by the type-checker in the process; those are worth calling out explicitly.


Real bugs fixed by the type-checker (not just annotation noise)

  1. agent_tasks.py — missing await on agent.stream() (async for event in await agent.stream(...))
    This was a real runtime bug. Without await, the async for iterates over a Coroutine object rather than the AsyncGenerator, which raises TypeError at runtime. The fix is correct.

  2. export_tasks_v2.pyexport.errorexport.errors
    UserExport has no error field. Before this fix, any exception in package_corpus_export_v2 would cause the except-block itself to raise AttributeError, hiding the original error and leaving backend_lock=True forever. The errors field fix + regression test is exactly right.

  3. doc_tasks.py / export_tasks.py — FUNSD annotation_map key type mismatch
    doc_tasks.convert_doc_to_funsd was building annotation_map with page = str(page_data.page_index) (string key), while export_tasks.package_funsd_exports was looking up with if index in funsd_annotations (integer index). After the fix both sides use int keys consistently, and all FUNSD annotations that were previously silently dropped will now be included in exports. The regression test is a good guard.

  4. export_tasks.pyContentFile wrapper on FieldFile.save
    BytesIO doesn't fully satisfy Django's internal File protocol. Wrapping with ContentFile(output_bytes.read()) is the correct fix.

  5. fork_tasks.pyFieldFile.name guard before Storage.open
    An unset FieldFile.name would crash default_storage.open(None). The if old_label_set.icon and old_label_set.icon.name: guard is correct.

  6. analyzer_tasks.pyreturn analyzer_manifests or []
    Prevents a None return from a function typed as list[AnalyzerManifest].

  7. extract_orchestrator_tasks.py — early return on extract_id is None
    Extract.objects.get(pk=None) would raise DoesNotExist/TypeError depending on the DB backend. The explicit guard + logger.error is better.


Type-system tightening — all look correct

  • LabelLookupPythonType.{text_labels, doc_labels} narrowed to dict[str, …] — The producer in build_label_lookups already emits str(pk) keys, so the narrowing matches reality.
  • build_label_lookups(corpus_id: int) — Every call site passes an integer. Fixing the declared type is correct.
  • load_or_create_labels / import_doc_annotations accept Mapping — Correct application of Liskov; TypedDicts satisfy Mapping but not dict.
  • _validate_sidecar_schema / _apply_sidecar_annotations accept str | None for sidecar_path — Correct; callers may pass a fallback None.

Observations / suggestions

1. Remaining # type: ignore — tracked appropriately?

The scoped ignores in data_extract_tasks.py reference issue #1381 inline, which is good. The ignores in import_tasks_v2.py for unpack_label_set_from_export / unpack_corpus_from_export / prepare_import_labels don't have issue references:

labelset_obj = unpack_label_set_from_export(label_set_data, user_obj)  # type: ignore[arg-type]

Are these tracked somewhere, or is the intent to fix them in the same typing-graduation series? If not tracked yet, worth adding a # TODO(#NNNN) reference so they don't go stale.

2. cast("list[Any]", tools) in agent_tasks.py

This silences the invariant-list complaint but also loses type safety for callers passing a list[str]. The real fix is to widen the AgentAPI.for_corpus/for_document tools parameter to Sequence[str | CoreTool | …] (which is covariant), which would let mypy accept list[str] without a cast. If that's tracked under the same umbrella, the cast is fine short-term, but worth noting.

3. assert s3 is not None in export_tasks.py

assert s3 is not None  # set above when STORAGE_BACKEND == "AWS"

assert statements are stripped with python -O. In a production task that runs under Celery, this is equivalent to no guard if optimizations are enabled. Consider a proper guard:

if s3 is None:
    raise RuntimeError("S3 client is None in AWS storage branch — should not happen")

4. corpus_tasks.py — dropped return summary

The return summary removal is correct per the -> None annotation, but it's worth confirming no call site relied on the returned dict. Quick grep in the repo would be sufficient; the callers are Celery delay()/apply_async() invocations so in practice nothing captures the return value, but explicit verification is cheap.

5. Double blank line in CHANGELOG.md — Minor: there are two blank lines after the new entry (lines 10-11 in the diff). One is conventional.


Test coverage

Solid. test_export_tasks_typing_fixes.py is a well-structured new test file with four focused test classes targeting the real bugs. The additions to test_doc_tasks.py for int-keyed annotation_map are exactly the right regression guards. Good to see the tests are exercising the actual data flow (building a real zip, reading it back) rather than just patching everything away.


Summary

Approve with minor suggestions. The type fixes are correct, the real bugs uncovered are all addressed, test coverage is present for the most important behavioral changes, and the approach of scoped # type: ignore with documented reasons is appropriate for the remaining edge cases. The observations above (tracked ignores, cast vs. API widening, assert in production path) are all non-blocking but worth addressing before the baseline drain reaches the tests package.

JSv4 added 2 commits May 5, 2026 22:30
…GELOG

Three follow-ups from the Claude review on PR #1541:

- Replace assert s3 is not None in export_tasks.py with a real
  RuntimeError guard. Asserts are stripped under python -O and
  Celery tasks may run with optimisations enabled in production, so
  the assert effectively no-ops in the AWS branch. The new guard
  fails loud with a descriptive message — defence-in-depth for a
  branch that should be unreachable but covers any future ordering
  changes to the s3 initialisation above.

- Add # TODO(#1482) references to the three # type: ignore
  suppressions in import_tasks_v2.py. The dict<->TypedDict bridging
  is not a permanent state — it goes away once
  unpack_label_set_from_export / unpack_corpus_from_export /
  prepare_import_labels widen their signatures to Mapping[str, Any]
  in the next typing-graduation chunk. The TODO refs ensure these
  ignores can't quietly outlive the issue they belong to.

- Drop the stray double blank line between the new tasks-graduation
  changelog entry and the Docker hardening entry. The project norm
  is one blank line between entries.
@claude
Copy link
Copy Markdown

claude Bot commented May 6, 2026

Code Review

Overview

This PR graduates all 16 opencontractserver.tasks.* modules from the mypy baseline. Beyond typing cleanup, several real bugs were fixed in the process — which is exactly the value of this exercise. The changes are methodical, the scope is well-bounded, and the test coverage directly targets the behavior changes.


Real Bugs Fixed (worth calling out explicitly)

These are not just annotation cleanup — they fix actual runtime or data-correctness problems:

agent_tasks.py — Missing await on agent.stream() (line ~202). The coroutine was never iterated because async for event in <coroutine> silently fails; the fix adds the required await. This was a genuine runtime bug.

export_tasks_v2.pyexport.error = Trueexport.errors = str(e). The field error doesn't exist on UserExport; this would have raised AttributeError in the exception handler, masking the original failure and leaving backend_lock = True forever. Now the actual error string is recorded and the lock is cleared.

FUNSD key type mismatchdoc_tasks.py was building annotation_map with int keys, but export_tasks.py was looking them up with str(index). The result: all FUNSD annotations were silently dropped during export. The fix aligns both sides on int keys and adds regression tests in both test_doc_tasks.py and test_export_tasks_typing_fixes.py. Good catch.

export_tasks.pyexport.file.save(filename, output_bytes) where output_bytes is a BytesIO. Django's FieldFile.save expects a File instance, not a raw BytesIO. Wrapping in ContentFile is correct.

corpus_tasks.py / extract_orchestrator_tasks.py — Coercing Celery str | int args to int at function entry is correct; Celery's JSON serializer can widen integers to strings depending on how tasks are called, and downstream APIs (e.g. bulk_queue, CorpusActionExecution.get) expect concrete int.


Type Narrowing Quality

LabelLookupPythonType narrowing (dict[str | int, …]dict[str, …]) is sound — the producer (build_label_lookups) has always keyed by str(pk). Narrowing corpus_id on build_label_lookups from str to int matches every call site.

Mapping widening for load_or_create_labels / import_doc_annotations is correct and Liskov-friendly. TypedDicts are structurally compatible with Mapping[str, Any].

cast() usage is appropriate and minimal — used for get_component_by_name return types (untyped registry), Celery's list invariance issue, and the V2 corpus type narrowing after a None-check. No broad cast(Any, …) escapes.

# type: ignore scoping is properly scoped with error codes on every instance (e.g. [attr-defined], [valid-type], [typeddict-item]). No bare type: ignore entries.


Minor Issues / Suggestions

import_tasks_v2.pyTODO(#1482) markers: Three ignores are tagged TODO(#1482) referencing the "typing-graduation umbrella." Since this PR is #1482 and those utilities (unpack_label_set_from_export, unpack_corpus_from_export, prepare_import_labels) live outside the tasks package, the TODOs should reference the broader tracker (#1447) or a follow-up issue, not the current one. Minor but will cause confusion when this PR merges.

doc_tasks.py — early ValueError for missing file names:

if not doc.pawls_parse_file.name or not doc.pdf_file.name:
    raise ValueError(f"Document {doc_id} is missing pawls_parse_file or pdf_file")

This is a behavioral improvement (fail-fast with a clear message vs. a confusing TypeError from Storage.open(None)), but worth noting in the CHANGELOG/PR notes since callers that caught the old exception type will need to handle ValueError instead.

memory_tasks.py # type: ignore[arg-type]: The ignore on conversation.chat_messages.filter(...) passed to database_sync_to_async(lambda: list(...)) suppresses a legitimate mypy complaint about QuerySet[Row4] vs the expected iterable type. A brief note explaining why mypy can't infer this (the named-tuple values_list return type is opaque to mypy's Django plugin) would help future readers.

corpus_tasks.py comment style: The comment block added for Celery arg coercion is useful, but per the project style guide in CLAUDE.md, inline comments should be kept to one short line for non-obvious WHY reasoning. Consider condensing to: # Celery JSON serialisation may widen int → str.

data_extract_tasks.py temperature ignore: The # type: ignore[arg-type] # TODO(#1381) is fine as a bridge. Just make sure #1381 is still open and tracked — the described fix (widening temperature to Optional[float]) is straightforward.


Test Coverage

The new tests are solid:

  • test_convert_doc_to_funsd_annotation_map_uses_int_keys directly guards the FUNSD key regression.
  • test_convert_doc_to_funsd_empty_doc_returns_empty_map is a useful boundary case.
  • test_export_tasks_typing_fixes.py tests all four behavior changes (ContentFile, missing-file guard, int-key FUNSD lookup, errors field). The temp-file cleanup with try/finally is correct.

One gap: there's no test for the agent.stream() missing-await fix. That's hard to unit-test (needs a running agent), but a note in the PR description about manual verification would be reassuring.


Summary

High-quality typing graduation PR with real bug fixes bundled in. The three TODO(#1482) markers in import_tasks_v2.py should be updated to reference the correct tracking issue before merge. Everything else looks correct and consistent with project conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typing: graduate opencontractserver.tasks.* from mypy baseline

2 participants