Type-check opencontractserver.llms modules (issue #1484)#1543
Conversation
…line Closes #1484. Removes the per-module ignore_errors blocks for the nine remaining llms.* modules (agents, api/client, vector_stores) and fixes the underlying ~125 type errors so mypy runs clean on this surface. Notable fixes: - Replaced `callable`/`any` placeholders with `typing.Callable[..., Any]` and `typing.Any` across api/client/vector_stores. - Made implicit Optionals explicit (`no_implicit_optional` defaults). - Tightened `_resolve_framework`/`_resolve_tools` helpers in llms.api so framework normalisation produces a concrete `AgentFramework`. - Typed Annotation/ChatMessage querysets as their custom subclasses so the `search_by_embedding` extension method resolves. - Switched `CorpusAgentContext.documents` to a non-Optional list with `default_factory=list`; aligned `CoreAgent.stream` / `CoreAgent.resume_with_approval` Protocol signatures with their PydanticAI implementations. - Pruned now-unused `# type: ignore` comments and the unused `CoreAnnotationVectorStore` annotation from `pydantic_ai_tools`. - Pruned the corresponding 189 advisory entries from `docs/typing/mypy_baseline.txt`.
Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com>
Code Review: PR #1543 — Type-check opencontractserver.llms modulesOverviewThis PR graduates 9 modules from the mypy Real bugs fixed — worth highlightingThese are not just type annotations; they are correctness fixes:
Positive patterns used well
Issues and suggestionsMedium —
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
- embeddings.py: cast get_component_by_name to type[BaseEmbedder] and annotate embedder_class so the Optional return type is satisfied. - core_vector_stores.py / core_conversation_vector_stores.py: raise ValueError when get_embedder() yields no path (search_by_embedding requires str, not Optional[str]). Failing fast at construction time is preferable to a downstream type error. - core_agents.py: lift cast import to module-level and replace the unreachable-by-NotImplementedError yield with the conventional if False: idiom (review nit). - pydantic_ai_agents.py: log a warning when a paused message's pending_tool_call is missing 'name' so malformed persistence state doesn't silently drive downstream lookups with an empty tool name.
Code Review: Type-check opencontractserver.llms modules (#1484)This PR graduates the remaining Behavioral Changes Hidden in a Typing PRSeveral changes go beyond annotation-only fixes and alter runtime behavior. These are worth calling out explicitly.
# Before
documents: Optional[list[Document]] = None
# After
documents: list[Document] = field(default_factory=list)The
# Before (None-safe default)
content=raw.get("content", raw.get("text", ""))
# After (falsy-check, treats empty string "" as missing)
content = raw.get("content") or raw.get("text") or ""The new version will fall through to
Protocol Contract Changes
# Before
async def stream(self, message: str, **kwargs) -> AsyncGenerator[...]
# After
def stream(self, message: str, **kwargs: Any) -> AsyncGenerator[...]Changing the Protocol method from
|
Code Review: Type-check opencontractserver.llms modules (#1543)This is a well-scoped, systematic typing PR that cleans up ~125 mypy errors across 9 modules and graduates them out of the baseline. Overall the approach is sound and the changes are correct. A few items worth discussing: Positive Changes Worth Calling Out
Issues / Concerns1. Changing from explicit keyword parameters to 2. The 3. The Protocol's 4. New # pydantic_ai_agents.py
async for event in tool_stream: # type: ignore[assignment]This PR removes many 5. Typing 6. Duplicated The sync and async variants of the conversation vector store class now both contain an identical ≈30-line Missing Test CoverageThe following are behaviour changes (not just typing changes) with no accompanying tests:
Minor Nits
SummaryThe typing work is thorough and correct. The real behaviour fixes (sync_to_async, permission checks, isinstance event dispatch, dataclasses guard) are genuinely valuable. The main asks before merging:
|
…-lgZT0 # Conflicts: # CHANGELOG.md
… tests Two follow-ups from the Claude review on PR #1543: - The async for event in tool_stream: # type: ignore[assignment] in pydantic_ai_agents.py was the only new ignore introduced by the typing graduation. Removing it would either require renaming the loop variable across ~30 reference sites in the same branch (every event.part.* / event.result.* access) or a cast(Any, ...) that surfaces unrelated mypy errors downstream (the casted event flows into serializable_args whose four assignment branches give it different types). Add a multi-line comment explaining why the narrow [assignment]-only suppression is the right tradeoff and what the runtime contract is, so a future reader can either accept the ignore or rename with full context. - Add test_llms_typing_behavior_guards.py covering the runtime guards the PR introduced that the review flagged as untested: - _resolve_framework: TypeError on unrecognised non-string objects, ValueError on unknown strings, pass-through for str / enum / None inputs. - PermissionError contract on add_document_note_tool / duplicate_annotations_tool / add_exact_string_annotations_tool — pinned via the message-string contract (3+ guards in source) so a refactor that drops one surfaces here. - Anti-regression guards for the sync_to_async(list)(queryset) bug, the dataclasses.is_dataclass + isinstance(type) guard, the isinstance(ev, ThoughtEvent) dispatch (vs. string-compare), and the llm_msg_id is not None persistence guard in _stream_core. - CorpusAgentContext.documents default_factory=list (vs. Optional) contract, including a runtime check that documents == [] when the caller doesn't pre-populate. These also restore the PR's codecov backend coverage by exercising the new branches in api.py and pydantic_ai_agents.py.
Code Review — PR #1543: Type-check
|
…-lgZT0 # Conflicts: # CHANGELOG.md # docs/typing/mypy_baseline.txt
- Convert 8 ``assert ... # nosec B101`` security/invariant guards to
explicit ``if/raise RuntimeError`` so the checks survive ``python -O``
optimisation (asserts are stripped under ``-O``). Sites:
* ``CoreConversationManager.create_for_document`` / ``create_for_corpus``
(cid resolution)
* ``CoreConversationManager`` ``store_*_message`` x3 (factory user_id
invariant)
* ``ConversationVectorStore`` / ``CoreAnnotationVectorStore`` corpus_id
auto-detect branches.
- Replace ``vector_store: Optional[Any]`` on ``PydanticAIDependencies``
with a typed ``Optional[_AgentVectorStoreProto]``. The new Protocol
declares the only attribute callers actually reach for
(``similarity_search``) and avoids the circular import that would
otherwise force ``Union[CoreAnnotationVectorStore,
PydanticAIAnnotationVectorStore]``.
- Document ``CorpusAgentContext.initialize`` reload semantics: the
switch from ``is None`` to ``not self.documents`` was deliberate to
match the post-typing default (``[]`` instead of ``None``); the
docstring now spells out that callers wanting "no docs — skip load"
should not invoke ``initialize()`` at all.
- ``VectorStoreAPI.create`` no longer reads
``LLMS_DOCUMENT_AGENT_FRAMEWORK`` — the leak from the document-agent
path was unintentional. New dedicated setting
``LLMS_VECTOR_STORE_FRAMEWORK`` with the same fallthrough default
(``AgentFramework.PYDANTIC_AI``), so existing deployments keep
working without a settings change.
- ``TimelineBuilder._timeline`` accumulation comment now explains why
the storage stays as ``list[dict[str, Any]]`` (mutating ``entry``
dicts post-construction would be rejected by mypy if typed as the
closed ``TimelineEntry`` TypedDict) and where the schema contract is
pinned (the property-level ``cast``).
- ``test_llms_typing_behavior_guards``:
* Rename three importability-only tests
(``test_aadd_*_is_not_called_when_user_id_is_none``) to
``..._is_importable`` so the scope matches what they actually
assert; behaviour pin remains in
``test_permission_error_message_format``.
* All ``Path("opencontractserver/...")`` source-grep call sites
now resolve via ``Path(__file__).resolve().parents[2]`` so the
tests work regardless of the runner's CWD.
CI fix: - ``test_shared_managers.py``: re-format the multi-line ``order_by + values_list + first`` chain that black wanted as a single line. The earlier formatting tripped the linter job; running black locally yields the canonical layout. Review feedback (PR #1533, second pass): - ``test_async_task_decorators::test_async_function_has_access_to_txt_text`` lost the final ``span_annotation.annotation_label.text == "TEXT_SPAN_ASYNC"`` assertion when the new ``test_async_doc_analyzer_task_txt_no_extract_file_raises`` test was appended in the previous review-fix commit. Restored — the end-to-end label-text check is the only place that pins the analyzer's annotation_label_text propagation through the async post-processing path. - ``shared/Managers.py``: condense the 7-line explanatory block above the ``model_name is not None`` guard to one short paragraph (per CLAUDE.md comment style) and convert the ``assert`` to an explicit ``if model_name is None: raise RuntimeError(...)`` so the invariant survives ``python -O`` (where asserts are stripped). Same fix pattern as PR #1543; the reviewer didn't flag it on this PR but the security rationale is identical. The reviewer's other action item — restoring ``list[TokenIdPythonType]`` annotations in ``llamaparse_parser.py`` that were demoted to comments — is out of scope: the comment-style demotions were introduced by PR #1544 (pipeline review fixes) which has already merged into ``main``; this PR does not touch ``llamaparse_parser.py``. A follow-up tracker should be opened against the pipeline graduation if those annotations need restoring.
| raise RuntimeError( | ||
| "internal invariant violated: corpus_id is None in " | ||
| "auto-detect branch (the `else` should only be reached " | ||
| "when corpus_id is set)" | ||
| ) |
| raise RuntimeError( | ||
| "internal invariant violated: corpus_id is None in " | ||
| "auto-detect branch (the `else` should only be reached " | ||
| "when corpus_id is set)" | ||
| ) |
| raise RuntimeError( | ||
| "internal invariant violated: corpus_id is None in the " | ||
| "embedder-path-absent branch (constructor requires at " | ||
| "least one of corpus_id / embedder_path)" | ||
| ) |
|
|
||
| async def similarity_search( | ||
| self, query: str, *, k: int = ..., **kwargs: Any | ||
| ) -> Any: ... |
Code Review: Type-check
|
…-lgZT0 # Conflicts: # CHANGELOG.md # mypy.ini
Code Review — PR #1543 (Type-check opencontractserver.llms modules)Overall this is a well-structured typing graduation. The changes are clearly scoped to the Genuine bug fixes worth calling out explicitly
This is a real correctness fix.
Adding explicit null-guards before
Switching to
Guards against Concerns1. Semantic change in # Before
documents: Optional[list[Document]] = None
if self.documents is None:
self.documents = await sync_to_async(list)(self.corpus.get_documents())
# After
documents: list[Document] = field(default_factory=list)
if not self.documents:
self.documents = await sync_to_async(lambda: list(self.corpus.get_documents()))()The intent is to replace The PR comment in the docstring acknowledges this ("Callers that want to explicitly state 'no documents — skip loading' should not invoke 2. # Before
async def _stream_core(
self,
message: str,
*,
force_llm_id: int | None = None,
force_user_msg_id: int | None = None,
...# After
async def _stream_core(self, message: str, **kwargs: Any) -> ...:
force_llm_id: Optional[int] = kwargs.pop("force_llm_id", None)
...The PR correctly identifies that the mixin contract requires a compatible signature. However, the tradeoff is that callers now pass these parameters as untyped kwargs — a typo like 3. Permission guard tests are mostly importability checks, not behavioral tests def test_aadd_document_note_is_importable(self) -> None:
from opencontractserver.llms.agents import pydantic_ai_agents as paa
self.assertTrue(callable(paa.aadd_document_note))The test file acknowledges this limitation ("We can't easily invoke the inner closure from outside"). These tests verify the function names are importable but don't verify that Two untested guards worth adding behavioral tests for:
4. Unreachable async def _stream_core(self, ...) -> AsyncGenerator[UnifiedStreamEvent]:
raise NotImplementedError("_stream_core() must be implemented by adapter")
# The unreachable yield turns this into an async generator...
yield cast(UnifiedStreamEvent, None)This is a known Python trick to make a function an async generator. It works, but 5. The default Nits / style
SummaryThe PR achieves its goal — 9 modules graduate from |
…-lgZT0 # Conflicts: # CHANGELOG.md
Targets the lines added by the llms.* mypy graduation in PR #1543 that codecov flagged as uncovered (75.62% patch coverage). New unit tests cover: - core_agents: CorpusAgentContext.initialize() both branches (load / skip), CoreAgentBase._normalise_source content fallback chain, get_default_config defaults dict + None-override filter. - pydantic_ai_agents: _extract_tool_result_summary (dict/string/None/ truncation/error), _event_to_text_and_meta (all isinstance branches: TextPart, ToolCallPart, TextPartDelta, ToolCallPartDelta, unsupported), _usage_to_dict (None / model_dump / dataclass instance / dataclass *class*-object guard / unknown). - client.SimpleLLMClient._chat_openai with and without max_tokens. - core_vector_stores.CoreAnnotationVectorStore ctor: explicit-path, corpus-only, ValueError on unresolved embedder, validation error. - core_conversation_vector_stores.{CoreConversationVectorStore, CoreChatMessageVectorStore} ctors: explicit-path with/without corpus, resolution-failure swallowed, auto-detect ValueError, auto-detect success. - timeline_stream_mixin.TimelineStreamMixin.stream: timeline injection, _finalise_llm_message hook invocation, default _stream_core NotImplementedError. Tests are mock-heavy and narrow — coverage tests, not behavioural contracts (those live in test_llms_typing_behavior_guards.py).
Code Review: PR #1543 — Type-check
|
| Severity | Issue |
|---|---|
| 🔴 Bug | add_document_note_tool raises for valid standalone-document agents |
| 🟡 Behavioral | Undocumented LLMS_VECTOR_STORE_FRAMEWORK setting split from LLMS_DOCUMENT_AGENT_FRAMEWORK |
| 🟡 Behavioral | Empty-corpus double-fetch after Optional[list] → list field change |
| 🟡 Type safety | **kwargs: Any in _stream_core silently swallows unknown keys |
| 🟡 Type asymmetry | int(corpus_id) cast missing in CoreConversationVectorStore.search |
| 🔵 Minor | Redundant list() wrap + spurious union members in _resolve_tools caller |
| 🔵 Pre-existing | Dead messages variable — follow-up ticket recommended |
The functional regression in add_document_note_tool should be fixed before merge. The remaining items are lower-priority but worth addressing.
agent.stream() is an async def with yield — it returns an AsyncGenerator directly, not an awaitable. async for event in await agent.stream(...) was a left-over from a pre-typing iteration; mypy flags it now that the surrounding tasks module type-checks.
Code Review: PR #1543 — Type-check
|
- Drop the corpus-required guard on add_document_note_tool: the underlying aadd_document_note accepts corpus_id=None, so standalone- document agents must keep working. Forward context.corpus.id when present and None otherwise. - _stream_core now raises TypeError on unknown kwargs instead of silently swallowing typos. - Mirror the int(self.corpus_id) cast on the sync Conversation vector store search paths so the ORM honours string-typed corpus IDs as CoreChatMessageVectorStore.search already does. - Drop redundant list() wrap in api.py around _resolve_tools, cast through the wider factory parameter type instead. - Document LLMS_VECTOR_STORE_FRAMEWORK in the LLM architecture README so operators understand the new setting is intentionally independent of LLMS_DOCUMENT_AGENT_FRAMEWORK. - Add behaviour-guard tests pinning all three contracts.
Code Review: PR #1543 — Type-check
|
Code Review: Type-check
|
Code Review: PR #1543 — Type-check opencontractserver.llms modulesOverall: This is a well-executed typing graduation. The ~125 type errors are fixed correctly, the behavioral side-effects are documented, and the test file covers the key new runtime contracts. A few things worth discussing before merge. What's Done Well
Issues Worth Discussing1. Semantic change in
|
After merging main (which renumbered the conflicting 0069 grounding migration to 0071 in #1581) the only remaining failing tests on this branch are the same two typing-test assertion drifts that were failing on main since #1543. Apply the same fixes already shipped via PR #1575: - ``test_callable_resolves_via_from_function``: patch ``CoreTool.from_function`` (the bound name actually invoked at api.py:713), not the unused ``ToolAPI.from_function`` indirection. - ``test_add_document_note_tool_passes_none_when_corpus_absent``: scope the assertNotIn to the ``add_document_note``-specific message so it doesn't accidentally match ``add_exact_string_annotations``'s legitimate corpus-required guard at line 2497. Also address Claude bot review on this PR: - ``leaderboardAvatar.ts``: export ``AVATAR_VIOLET`` / ``AVATAR_PINK`` so theme audits can import the literals directly. - ``user_types.py``: hoist ``_stripped`` from inside ``resolve_display_name`` to a module-level private helper — the resolver runs per leaderboard query, no need to rebuild a function object each call. - ``useTabVisibilityRefresh``: in development, log refetch errors (sync throws + rejected promises) instead of silently swallowing them. Production behaviour is unchanged so caller query state remains the source of truth there.
…ts (#1582) Two tests added in PR #1543 (Type-check opencontractserver.llms modules) have been failing on main, blocking pytest (and therefore codecov) on every PR. Both tests assert against the wrong target: 1. test_callable_resolves_via_from_function patches opencontractserver.llms.api.ToolAPI.from_function, but _resolve_tools calls CoreTool.from_function directly. The mock never intercepts the actual call site, so _resolve_tools returns real CoreTool instances and the assertEqual against the sentinel fails. Patch CoreTool.from_function instead. 2. test_add_document_note_tool_passes_none_when_corpus_absent does a global file scan with assertNotIn("requires the agent to be scoped to a corpus"), but that string legitimately exists for add_exact_string_annotations (a different tool that does require a corpus). Scope the assertion to just the body of add_document_note_tool by slicing the file from its def to the next async def at the same indent level.
…1567) * Leaderboard: redact OAuth provider IDs + drop aggressive polling Closes #1557. A. The USER column was rendering raw Auth0 ``sub`` values like ``google-oauth2|114688257717759010643`` because the leaderboard resolvers selected ``user.username`` directly, and Django's ``User.username`` is set to the OAuth ``sub`` claim for social-login users (see ``jwt_get_username_from_payload_handler``). Added a ``displayName`` resolver to ``UserType`` with the priority ``name`` → ``given_name`` + ``family_name`` → ``first_name`` + ``last_name`` → ``username`` (when not a ``provider|sub``) → redacted ``user_<last 6>`` fallback. ``GET_LEADERBOARD`` and ``GET_DISCOVERY_DATA`` (plus the matching TS types and components) now select / render ``displayName``. Backend regression coverage in ``test_user_display_name.py`` pins the priority chain and the no-leak guarantee. B. The Community Leaderboard previously polled every 60s (``GET_LEADERBOARD``) and 120s (``GET_COMMUNITY_STATS``) regardless of tab visibility, which re-rendered rows and reset in-flight UI state. Dropped both ``pollInterval``s; queries now use ``cache-and-network`` + ``cache-first`` next-fetch + silent network-status updates so the user sees instant cached data while a single background refresh runs on mount or filter change. A ``visibilitychange`` listener performs one refetch when the tab returns to the foreground; hidden tabs no longer hit the network. * Address review: fix short-sub pipe leak in display name redaction The redacted fallback in UserType.resolve_display_name took the last 6 characters of the raw username, which for sub strings shorter than 7 characters (e.g. auth0|abcde, length 11) included the | separator — returning user_|abcde. test_redacts_short_oauth_sub asserts the result must not contain |, so the test failed against the original code. Fix: split on the last | and slice the suffix from the sub part only, so the redaction never carries the provider prefix or the separator. auth0|abcde → user_abcde, google-oauth2|114688257717759010643 → user_010643 (unchanged), a|b → user_b. Also extracted the magic 6 into OAUTH_SUB_DISPLAY_SUFFIX_LENGTH in opencontractserver/constants/auth.py per CLAUDE.md ("No magic numbers"), and added a comment on the bare "user" fallback documenting that it is intentionally non-unique because Django enforces non-empty username so the branch is effectively unreachable. Verified: docker compose -f test.yml -p opencontracts run --rm django pytest opencontractserver/tests/test_user_display_name.py — 9 passed. Refs PR #1567 review. * Extract leaderboard utilities to shared modules + add unit tests Extracts the local getInitials and getAvatarColor helpers from CompactLeaderboard.tsx into frontend/src/utils/leaderboardAvatar.ts so they can be unit-tested independently of the rendering layer. Extracts the visibility-driven refresh logic from Leaderboard.tsx into a small useTabVisibilityRefresh hook (frontend/src/hooks/) with its own unit tests, replacing the inline useEffect. Net: same runtime behaviour, but the OAuth/multi-token initial branches and the visibility-change refetch path are now covered by direct unit tests instead of indirect rendering coverage. * Address review: remove issue refs, harden hook, tighten tests - Drop Issue #1557 inline references from source files per CLAUDE.md - Make useTabVisibilityRefresh stable across renders via useRef so consumers no longer need useMemo on the refresh fn array - Pin the redacted suffix value in test_redacts_short_oauth_sub - Add coverage for whitespace-only given_name/family_name - Move loose hex literals in leaderboardAvatar to named constants - Drop the SSR-only typeof document guard (Vite SPA, no SSR) - Migrate the new hook test to the project's React-18-native renderHook * tests: disable note post_save signal in pytest fixtures NoteVisibilityTest.setUpTestData was failing in CI with EmbeddingGenerationError because Note.objects.create(...) triggers process_note_on_create_atomic, which schedules calculate_embedding_for_note_text via eager celery. With CELERY_TASK_EAGER_PROPAGATES=True, the embedder lookup failure (empty default_embedder in PipelineSettings, likely from cross-worker Redis cache contamination in pytest-xdist) propagates up through the signal and aborts the class fixture. The Document and Annotation post_save signals are already disabled in conftest.py for exactly this reason. Note creation has the same hazard (it schedules an embedding task) and should be handled the same way. Mirror the existing disconnect/reconnect pattern for the Note signal in both `disable_document_processing_signals` (autouse, session-scoped) and the opt-in `enable_doc_processing_signals` fixture so integration tests can still exercise the full pipeline when they need to. * PR #1567: address review + unblock pytest after main rebase After merging main (which renumbered the conflicting 0069 grounding migration to 0071 in #1581) the only remaining failing tests on this branch are the same two typing-test assertion drifts that were failing on main since #1543. Apply the same fixes already shipped via PR #1575: - ``test_callable_resolves_via_from_function``: patch ``CoreTool.from_function`` (the bound name actually invoked at api.py:713), not the unused ``ToolAPI.from_function`` indirection. - ``test_add_document_note_tool_passes_none_when_corpus_absent``: scope the assertNotIn to the ``add_document_note``-specific message so it doesn't accidentally match ``add_exact_string_annotations``'s legitimate corpus-required guard at line 2497. Also address Claude bot review on this PR: - ``leaderboardAvatar.ts``: export ``AVATAR_VIOLET`` / ``AVATAR_PINK`` so theme audits can import the literals directly. - ``user_types.py``: hoist ``_stripped`` from inside ``resolve_display_name`` to a module-level private helper — the resolver runs per leaderboard query, no need to rebuild a function object each call. - ``useTabVisibilityRefresh``: in development, log refetch errors (sync throws + rejected promises) instead of silently swallowing them. Production behaviour is unchanged so caller query state remains the source of truth there. * Address review: gate email at resolver, fix initials whitespace bug, log refresh errors unconditionally Addresses comments on PR #1567: - Gate UserType.email at the resolver (self / superuser only) so the leak is closed regardless of which client selects the field; drop email from leaderboard query/type/mocks. Coverage in EmailResolverTestCase pins self / superuser / cross-user / anonymous / blank-email / context-missing matrix. - Fix getLeaderboardInitials: trimmed-token fallback so leading-whitespace single-token names render the correct initials instead of whitespace; whitespace-only names return "?". - useTabVisibilityRefresh logs callback errors via console.error unconditionally so production failures surface in the browser console / Sentry. - Tighten test_redacts_short_oauth_sub to make the OAUTH_SUB_DISPLAY_SUFFIX_LENGTH invariant explicit (assertLess on the test premise). * Address review: gate OAuth-sub redaction on is_social_user, doc clarifications Addresses Claude review feedback on PR #1567: - config/graphql/user_types.py: gate the OAuth-sub redaction in resolve_display_name on is_social_user rather than the presence of "|" in the username. The project's UserUnicodeUsernameValidator (allows | * \ in addition to the Django default set) means a local user named e.g. "alice|admin" is legitimate; the previous "|"-only check would have falsely redacted them. Redaction now fires only for social-login users where username == Auth0 sub. - opencontractserver/tests/test_user_display_name.py: existing redact-OAuth-sub tests now set is_social_user=True; added a new test_does_not_redact_local_username_with_pipe pinning the local-user pass-through behavior. - config/graphql/user_types.py: small inline comment on the email graphene field declaration explaining why the override is needed (so resolve_email's gate is not bypassed by the auto-exposed DjangoObjectType field). - frontend/src/hooks/useTabVisibilityRefresh.ts: clarifying notes-for- callers comment block — initial-mount fetch is intentionally not fired here (callers rely on Apollo's cache-and-network), and rapid visibility flips are absorbed by Apollo's query deduplication rather than an internal debounce. CHANGELOG.md updated to reflect the is_social_user gate. --------- Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
#1477) (#1533) * typing: graduate opencontractserver.shared.{Managers,decorators} (closes #1477) Drains the last two `[mypy-opencontractserver.shared.*]` ignore-blocks from `mypy.ini`. Continues #1470 (which graduated QuerySets/fields/mixins) and finishes the shared package's mypy migration. Removed from mypy.ini - opencontractserver.shared.Managers - opencontractserver.shared.decorators Pruned from docs/typing/mypy_baseline.txt - 37 lines (6813 -> 6776) Per-file fixes shared/Managers.py - Drop the `if TYPE_CHECKING / else AbstractUser as User` aliasing block (mypy treated `User` as a value, not a type — root cause of seven `valid-type` errors in the baseline). Replace User-as-type with AbstractBaseUser on EmbeddingManager.store_embedding(creator=...) and UserFeedbackManager.by_creator(creator=...), and Any for _apply_document_prefetches(user=...) which duck-types is_anonymous / is_superuser. - Cast self.model to Any once in BaseVisibilityManager.visible_to_user so the seven `model_cls.objects.*` sites typecheck. Same cast on the one self.model.DoesNotExist site in UserFeedbackManager.get_or_none. _default_manager would change call semantics for models that override objects, so the cast is preferred. - Coalesce self.model._meta.model_name or "" so the two .upper() calls don't union-attr against None. - Align PermissionManager.visible_to_user and UserFeedbackManager.visible_to_user with BaseVisibilityManager.visible_to_user(user, lightweight, with_doc_label_annotations) — kwargs accepted for parity, no behaviour change. Matches the pattern adopted in CorpusActionExecutionManager (PR #1473 review). - Silence `[misc]` on the PermissionManager.from_queryset(...) dynamic base for AnnotationManager and NoteManager. Documented inline. - Remove three unused `for_user(...)` delegation methods on PermissionManager / AnnotationManager / NoteManager. They called self.get_queryset().for_user(...) against querysets that never implement `for_user`, so any caller would have raised AttributeError at runtime. Verified no callers in opencontractserver/, config/, or tests. shared/decorators.py - Annotate pdf_text_extract as Optional[str] in both wrappers and assert it is not None inside the TXT branch (only reachable when txt_extract_file is non-empty, so the assert is a no-op at runtime). - Annotate pdf_data_layer as Any in both wrappers to absorb the `[] | PdfDataLayer` union without forcing plasmapdf stubs. - Add `-> Any` return annotations to the three remaining inner helpers (get_analysis_creator, get_analysis_analyzer, celery_task_with_async_to_sync.wrapper). Coverage - Managers.py: 100% return-annotation coverage (24/24) - decorators.py: 100% return-annotation coverage (16/16) Verification - pre-commit run --all-files -> all hooks pass (mypy: Passed) - No regressions in mypy across opencontractserver + config (1032 source files checked). * Fix model_cls in get_or_none, replace assert with guard, add manager coverage tests - UserFeedbackManager.get_or_none: hoist model_cls local so model_cls.DoesNotExist is evaluated once, not inside except clause (cleaner and avoids repeated cast on the hot path) - opencontractserver/shared/decorators.py: replace bare assert with an explicit if/raise guard so the invariant check survives python -O; update comment to reflect it is a fast-fail runtime guard, not a type-narrowing hint - opencontractserver/tests/test_shared_managers.py: new test module covering PermissionManager, UserFeedbackManager, and UserFeedbackManager.get_or_none for the user=None coercion path and hit/miss paths (boosts patch coverage for lines graduated from the mypy baseline in #1477) - mypy.ini: remove stray duplicate ignore_errors line left by merge-conflict resolution - opencontractserver/utils/embeddings.py: cast get_embedder return to type[BaseEmbedder] | None (get_component_by_name returns the broader PipelineComponentBase); fixes new mypy error surfaced after merging the utils.embeddings graduation from main * Add coverage for decorators.py guard paths and manager superuser branch - test_task_decorators.py: two new tests for the ``pdf_text_extract is None`` guard (decorators.py lines 273-276) covering both ``text/plain`` and ``application/txt`` MIME types - test_shared_managers.py: two new test classes covering BaseVisibilityManager.visible_to_user(superuser) (model_cls.objects.all() branch) and PermissionManager.visible_to_user(superuser); also imports clean-up (unused Annotation import removed) * Address review: assert non-None model_name, drop magic pk in get_or_none miss test - shared/Managers.py: replace silent 'or ""' fallback on Options.model_name with an explicit assert so an abstract-model invariant violation surfaces as a clear AssertionError instead of cascading through the permission table lookup as an empty string. - tests/test_shared_managers.py: compute a guaranteed-missing pk by taking max(pk) + 1 instead of hard-coding 999999999 (CLAUDE.md no- magic-numbers rule). * Fix mypy: remove redundant cast in get_embedder return (#1545) The cast(Optional[type[BaseEmbedder]], embedder_class) on the return statement was redundant because mypy infers the correct type from the assignments throughout the function body. The function's return type annotation already declares the same type, so mypy reports [redundant-cast]. This is the only mypy error on main and has been blocking the linter step of Backend CI for multiple commits, which in turn cascade-skips the pytest job and prevents Codecov from receiving fresh coverage. * typing: address review issues from pipeline.* mypy graduation (PR #1540) (#1544) Follow-up to #1540 (merged) addressing issues raised in the Claude code review: - Issue 1: Mark `dependencies` and `input_schema` as `ClassVar` on `PipelineComponentBase` to prevent shared mutable state across subclasses. Remove now-redundant declarations from `BaseParser`, `BaseEmbedder`, `BaseThumbnailGenerator`, `BasePostProcessor`, and `BaseReranker`. Mark `supported_file_types` and `supported_modalities` as `ClassVar` too. - Issue 2: Update `find_image_tokens_in_bounds` signature to accept `list[PawlsTokenPythonType]` (matches callers) instead of `list[dict[str, Any]]`, eliminating 3 `cast()` calls in `llamaparse_parser`. - Issue 3: Remove internal `cast(dict[str, Any], source)` from `LlamaParseParser._build_image_token` — `PawlsTokenPythonType` already declares all accessed fields as `NotRequired`, so `source.get(...)` is type-safe without widening. - Issue 5: Remove redundant `description`, `author`, `dependencies`, `input_schema` declarations from `BasePostProcessor` (inherited from base). Remove type annotations from concrete subclass overrides of `ClassVar` fields in `PDFRedactor`, `TxtParser`, and `NoopReranker`. - Issue 6: Restore `image_token_refs` type annotation on first declaration per scope in `llamaparse_parser`; bare assignments in subsequent scopes avoid `no-redef` errors while preserving intent via inline comment. - Fix `get_embedder` return in `utils/embeddings.py`: cast the full tuple to `tuple[Optional[type[BaseEmbedder]], Optional[str]]` to satisfy mypy, which could not infer the narrowed type from a first-element cast alone. All pre-commit hooks (black, isort, flake8, mypy) pass. * Demote chatty permissioning INFO logs to DEBUG (#1525) * Demote chatty permissioning INFO logs to DEBUG The get_users_permissions_for_obj/App name pair fired on essentially every authenticated request, dominating Django pod logs and inflating storage costs. Collapses the two lines into one logger.debug() call using lazy %-formatting so the message is not built unless DEBUG is enabled. Permission denial logs and WARNING/ERROR paths are unchanged. Closes #1436 * Merge main and apply embeddings.py mypy fix The latest main brought a mypy regression in opencontractserver/utils/embeddings.py where get_component_by_name returns type[PipelineComponentBase], but the function signature requires type[BaseEmbedder]. Cast the result and annotate the local embedder_class to satisfy the stricter return type. --------- Co-authored-by: Claude <noreply@anthropic.com> * Fix mypy: remove redundant cast on get_embedder return tuple PR #1544 reintroduced a redundant tuple cast on the get_embedder return statement, which mypy flags as [redundant-cast]. The variables embedder_class (annotated Optional[type[BaseEmbedder]]) and embedder_path (Optional[str]) already carry the correct types via type narrowing, so no cast is needed. This restores the fix from #1545 and unblocks the linter step on all currently open PRs. * Address review: add async TXT guard + retarget BaseVisibilityManager test Two follow-ups from the Claude review on PR #1533: 1. async_doc_analyzer_task was missing the runtime guard that the sync wrapper added: if pdf_text_extract is None: raise ValueError(...) inside the text/plain branch. The async post-processing path uses span['text'] (not a slice into pdf_text_extract) so the guard is defensive rather than required by the immediate code, but the contract is the same: TXT processing requires a non-empty txt_extract_file. Mirroring the sync invariant fails fast instead of silently saving annotations from a doc the analyzer couldn't compute spans for. Two new regression tests pin the contract for text/plain and application/txt. 2. BaseVisibilityManagerSuperuserTest claimed to exercise BaseVisibilityManager.visible_to_user but the test routed through Corpus.objects, which is a PermissionManager whose visible_to_user is fully overridden — the base manager's superuser / anonymous / guardian-fallback branches were never reached. Retarget the test to the Embedding model (annotations.Embedding uses EmbeddingManager(BaseVisibilityManager) without an override), so the call now lands directly in BaseVisibilityManager. Three subtests pin the superuser-sees-all branch, the anonymous-only- public branch (user=None coercion), and the unrelated-user guardian-fallback branch. Also fixes the embeddings.py redundant-cast regression that resurfaced during the main merge (post-#1545 cast is genuinely redundant, mypy was right; restore the no-cast form). * Fix CI lint, restore dropped assertion, harden Managers invariant CI fix: - ``test_shared_managers.py``: re-format the multi-line ``order_by + values_list + first`` chain that black wanted as a single line. The earlier formatting tripped the linter job; running black locally yields the canonical layout. Review feedback (PR #1533, second pass): - ``test_async_task_decorators::test_async_function_has_access_to_txt_text`` lost the final ``span_annotation.annotation_label.text == "TEXT_SPAN_ASYNC"`` assertion when the new ``test_async_doc_analyzer_task_txt_no_extract_file_raises`` test was appended in the previous review-fix commit. Restored — the end-to-end label-text check is the only place that pins the analyzer's annotation_label_text propagation through the async post-processing path. - ``shared/Managers.py``: condense the 7-line explanatory block above the ``model_name is not None`` guard to one short paragraph (per CLAUDE.md comment style) and convert the ``assert`` to an explicit ``if model_name is None: raise RuntimeError(...)`` so the invariant survives ``python -O`` (where asserts are stripped). Same fix pattern as PR #1543; the reviewer didn't flag it on this PR but the security rationale is identical. The reviewer's other action item — restoring ``list[TokenIdPythonType]`` annotations in ``llamaparse_parser.py`` that were demoted to comments — is out of scope: the comment-style demotions were introduced by PR #1544 (pipeline review fixes) which has already merged into ``main``; this PR does not touch ``llamaparse_parser.py``. A follow-up tracker should be opened against the pipeline graduation if those annotations need restoring. * Address review: improve TXT error message, trim test polish - TXT-extract guard now reports the actual file_type so the message is accurate for both application/txt and text/plain. - Drop the duplicated assertIsNotNone before the mypy-narrowing assert is not None in test_get_or_none_returns_object_on_hit. - Document the rationale for in-setUp imports in BaseVisibilityManagerSuperuserTest (AppConfig.ready() ordering). - Trim the verbose async TXT-guard comment in decorators.py to one short line, matching the sync path. * Lift backend patch coverage above 90% (close codecov gap on #1533) The patch coverage report flagged 5 missed lines in Managers.py (84.84% / target 87.98%): - line 180 raise RuntimeError on abstract-model guard - lines 203/206/209 unreachable branches inside the if/elif chain in BaseVisibilityManager.visible_to_user (None / superuser / anonymous), all dead because the method already returns early at lines 154-164 for those user states - line 302 user = AnonymousUser() in PermissionManager.visible_to_user (existing test used Corpus which has its own PermissionedTreeQuerySet.as_manager() and never reaches PermissionManager) Changes: - Managers.py: drop the dead None/superuser/anonymous branches that follow the early-return chain (CLAUDE.md "no dead code"). Keep the authenticated-non-superuser path with a clear comment about the invariant. The model_cls.objects.none() initializer stays so the outer except still has a defined fallback queryset. - test_shared_managers.py: - PermissionManagerVisibleToUserViaNoteTest: exercises Note.objects.visible_to_user(user=None), which actually goes through PermissionManager.visible_to_user (NoteManager is built via PermissionManager.from_queryset(NoteQuerySet)) and covers the AnonymousUser coercion path. - BaseVisibilityManagerAbstractModelGuardTest: patches Embedding._meta.model_name to None for the duration of one call to trip the explicit raise, verifying the outer except handler catches the RuntimeError and falls through to the creator/public fallback. This exercises the explicit guard introduced in review feedback (assert → raise) so it survives python -O. * Fix mypy after main merge: cast self.model for blob_field_names() After merging main (which added Document.blob_field_names()), the DocumentManager.unique_blob_paths_for_many() loop calls self.model.blob_field_names() — but self.model is type[_T] which mypy cannot resolve to Document. Cast it to type[Document] inside the method to make the attribute access type-safe without weakening the surrounding function signature. * Fix CI on PR #1533: scope source-greps and surface abstract-model guard - test_callable_resolves_via_from_function: patch the actual call site (CoreTool.from_function), not the ToolAPI wrapper that _resolve_tools no longer routes through. - test_add_document_note_tool_passes_none_when_corpus_absent: scope the source-grep to the add_document_note_tool body so the corpus-required diagnostic on add_exact_string_annotations no longer trips it. - Address review #1: hoist the abstract-model RuntimeError raise above the broad except in BaseVisibilityManager.visible_to_user so the bug surfaces instead of falling back to creator/public filtering. Update the corresponding test to assert the raise. - Address review #2: drop unnecessary string forward reference in cast(type[Document], self.model) — Document is already imported at runtime above the call site. * Strip trailing whitespace; address review on PermissionManager / NoteManager - Pre-commit's trailing-whitespace hook fixes a stray blank line inside a comment block in test_llms_typing_behavior_guards.py (CI was failing on this in the linter job). - PermissionManager.visible_to_user gains a docstring note that the superuser branch lives in PermissionQuerySet, not in the base manager shortcut bypassed by this override. - NoteManager picks up a one-line comment cross-referencing the AnnotationManager type-ignore rationale so the silenced dynamic-base-class warning is self-documenting.
Summary
Graduate the remaining
opencontractserver.llms.*modules (agents, api/client, vector_stores) out of the mypy baseline by fixing type errors and adding proper type annotations. This resolves issue #1484 and completes the typing work started in #1468.Key Changes
Core Agent Framework (
core_agents.py)CorpusAgentContext.documentsto usefield(default_factory=list)instead ofOptional[list[Document]]initialize()to safely handle async document loading with proper type narrowingstream()protocol method signature fromasync deftodef(returns async generator, not coroutine)resume_with_approval()to returnAsyncGenerator[UnifiedStreamEvent, None]consistentlyOptionaltype hints tosourcesandmetadataparameters across message methods_stream_raw()abstract method with unreachable yield to satisfy type checker_normalise_source()with explicit string conversionPydanticAI Agent Implementation (
pydantic_ai_agents.py)_stream_core()signature to accept**kwargs: Anyfor compatibility with mixin contractforce_llm_id,force_user_msg_id, etc.) from**kwargswith proper defaultseffective_historyvariable# type: ignorecomments where types are now clear_finalise_llm_message()andcomplete_message()resume_with_approval()to safely handleMessageStateenum migration with fallback_vs_kwargsdictionaryAgent Factory (
agent_factory.py)Anyimport for proper type annotationsAPI Module (
api.py)ToolTypefromUnion[str, CoreTool, callable]toUnion[str, CoreTool, Callable[..., Any]]_resolve_framework()helperfrom_function()parameter type fromcallabletoCallable[..., Any]get_structured_response_and_sources_from_document()Vector Store Modules
core_vector_stores.py: Fixed embedder initialization with proper null checks and assertions; updated return type toAnnotationQuerySetcore_conversation_vector_stores.py: AddedChatMessageQuerySetimport; improved embedder resolution with conditional logicpydantic_ai_vector_stores.py: AddedCallableimport for proper type hintsTimeline & Mixin Support
timeline_stream_mixin.py: Fixed_stream_core()return type toAsyncGenerator[UnifiedStreamEvent, None]with unreachable yieldtimeline_utils.py: Addedcastimport andTimelineEntryimport for type safetyConfiguration
ignore_errors = Truedirectives frommypy.inifor llms modulesImplementation Details
sync_to_async(lambda: ...)()pattern for safe queryset execution in async contexts# nosec B101comments for intentional assertions used as type guards**kwargsin public methods while extracting typed parameters internallycast()for type narrowing where runtime behavior is guaranteed but type checker needs helphttps://claude.ai/code/session_016L1wbQaPdTKtpuJnxErihF