Skip to content

Add API boundary schemas#241

Merged
wpak-ai merged 8 commits into
cppalliance:developfrom
jonathanMLDev:feat/api-boudary-schemas
May 27, 2026
Merged

Add API boundary schemas#241
wpak-ai merged 8 commits into
cppalliance:developfrom
jonathanMLDev:feat/api-boudary-schemas

Conversation

@jonathanMLDev
Copy link
Copy Markdown
Collaborator

@jonathanMLDev jonathanMLDev commented May 25, 2026

Summary:

Service functions accepted raw dict for external API data (e.g. get_or_create_slack_team(team_data: dict)). Untyped dicts flowed from fetchers through services via .get() key access, producing KeyError at arbitrary call depth on malformed payloads. Pyright could not verify data flow from fetcher → service.

Changes:

  • GitHub — new github_activity_tracker/api_schemas.py (GitHubIssue, GitHubPullRequest, GitHubCommit, bundle wrappers, GitHubApiValidationError). fetcher.py validates at yield time; sync/issues_and_prs.py and sync/commits.py accept typed bundles; workspace JSON files are validated on load.

  • Slack — new cppa_slack_tracker/api_schemas.py (SlackTeamPayload, SlackChannelPayload, SlackMessagePayload, SlackUserPayload, SlackApiValidationError). fetcher.py returns typed payloads; all three service boundary functions updated; get_or_create_slack_user in cppa_user_tracker updated; sync callers follow.

  • Discord (live-sync path) — new discord_activity_tracker/live_sync_schemas.py (DiscordLiveUserPayload, DiscordLivePreparedMessage, DiscordReactionPayload, DiscordLiveSyncValidationError). sync/utils.py parse_discord_user now returns a typed model; sync/messages.py _prepare_message_data returns Optional[DiscordLivePreparedMessage]; bulk_upsert_discord_users/messages/reactions in services.py accept typed lists — eliminating the KeyError paths documented in their own docstrings. (Discord ChatExporter path was already validated via staging_schema.py; untouched.)

  • Cross-cutting — core/errors.py classify_failure now maps pydantic.ValidationError and all four app *ValidationError types to VALIDATION. cppa_slack_tracker added to pyrightconfig.json include.

  • Tests — test_api_schemas.py for GitHub and Slack; test_live_sync_schemas.py for Discord; test_sync_utils.py updated for typed return; test_errors.py extended; test_get_or_create_slack_team_requires_team_id asserts SlackApiValidationError.

Refs: Eval Test 6 / B1. Unblocks contract testing (issue 7 / B8).

Related Issues

Summary by CodeRabbit

  • New Features

    • Live-sync ingestion now accepts typed payload schemas for Slack, Discord, and GitHub.
  • Improvements

    • Incoming API payloads are validated and classification of validation failures improved with concise error messages.
    • Services and sync pipelines consume normalized payload objects for more consistent processing.
    • Documentation updated to reflect widened accepted payload types.
  • Tests

    • Added/updated tests covering schema parsing, validation, and typed fetcher/service behavior.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Warning

Review limit reached

@jonathanMLDev, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 35 minutes and 11 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2b5dd11a-5d05-44d7-b56b-b5a88b100531

📥 Commits

Reviewing files that changed from the base of the PR and between db6b354 and 28fc70f.

📒 Files selected for processing (13)
  • CHANGELOG.md
  • core/errors.py
  • cppa_slack_tracker/services.py
  • discord_activity_tracker/api_schemas.py
  • discord_activity_tracker/services.py
  • discord_activity_tracker/sync/messages.py
  • discord_activity_tracker/sync/utils.py
  • discord_activity_tracker/tests/test_api_schemas.py
  • github_activity_tracker/api_schemas.py
  • github_activity_tracker/sync/commits.py
  • github_activity_tracker/tests/test_api_schemas.py
  • github_activity_tracker/tests/test_sync_commits.py
  • github_activity_tracker/tests/test_sync_issues_and_prs.py
📝 Walkthrough

Walkthrough

Adds Pydantic v2 ingestion-boundary schemas and parse helpers for GitHub, Slack, and Discord; fetchers now model_validate API responses, services accept typed payloads or dicts (normalized), sync pipelines handle both shapes, classify validation failures as VALIDATION, and tests/docs/pyright were updated.

Changes

Pydantic Boundary Schemas and Integration

Layer / File(s) Summary
Core failure classification
core/errors.py, core/tests/test_errors.py
classify_failure recognizes pydantic.ValidationError and module-specific validation exception classes as CollectorFailureCategory.VALIDATION; tests added.
Slack: schemas and fetchers
cppa_slack_tracker/api_schemas.py, cppa_slack_tracker/fetcher.py, cppa_slack_tracker/tests/*
Adds Slack Pydantic v2 models and parse helpers; fetchers validate responses with model_validate() and return typed payloads; parsing helpers raise SlackApiValidationError on invalid input; tests updated/added.
Slack: services and sync
cppa_slack_tracker/services.py, cppa_slack_tracker/sync/*, cppa_slack_tracker/tests/*, cppa_user_tracker/*
Services and sync code accept typed Slack payloads or dicts (normalized via parse_*), use typed attributes, handle edited/ts shapes, and update related tests and docs.
Discord: live-sync schemas and services
discord_activity_tracker/live_sync_schemas.py, discord_activity_tracker/services.py
Adds Discord live-sync Pydantic models, parse helpers, and a DiscordLiveSyncValidationError; bulk upsert functions accept typed payloads or dicts and normalize inputs before upsert.
Discord: sync utils and messages
discord_activity_tracker/sync/utils.py, discord_activity_tracker/sync/messages.py, discord_activity_tracker/tests/*
Sync and utility code produce/consume DiscordLivePreparedMessage/DiscordLiveUserPayload, normalize entries, and update tests to use attribute access.
GitHub: schemas and fetchers
github_activity_tracker/api_schemas.py, github_activity_tracker/fetcher.py, github_activity_tracker/tests/*
Adds GitHub Pydantic models and parsing helpers; fetchers parse API payloads into typed commits and issue/PR bundle models; tests updated.
GitHub: sync pipelines
github_activity_tracker/sync/commits.py, github_activity_tracker/sync/issues_and_prs.py
Commit and issue/PR sync pipelines accept parsed models or dicts, normalize workspace JSON, persist using model attributes, and adjust truncation/background-worker behavior.
Tests, docs, pyright, changelog
*/tests/*, docs/service_api/*, pyrightconfig.json, CHANGELOG.md
Adjust many tests to use attribute access on parsed models, update generated API docs to show Union types, include new package in pyright config, and add changelog entry.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • wpak-ai
  • snowfox1003

"A rabbit hops through schemas bright,
Parsers tidy payloads by moonlight.
Slack, GitHub, Discord — typed and neat,
Validation caught before services meet. 🐇✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add API boundary schemas' directly describes the main change—introducing Pydantic models at API ingestion boundaries for GitHub, Slack, and Discord to replace raw dict parameters.
Linked Issues check ✅ Passed All acceptance criteria from issue #238 are met: Pydantic models defined for GitHub, Slack, and Discord; service functions updated to accept typed models; fetchers validate at boundaries with model_validate(); invalid responses raise ValidationError (mapped to VALIDATION in classify_failure); type checker can now verify data flow; comprehensive tests added; ready for reviewer approval.
Out of Scope Changes check ✅ Passed All changes are scoped to the linked issue objectives. Additions include API schema modules, validation-error classification updates, service-function signature changes, sync refactoring to use typed models, comprehensive test coverage, documentation updates, and pyrightconfig updates—all directly supporting the goal of replacing raw dicts with typed boundaries.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@jonathanMLDev
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cppa_slack_tracker/sync/sync_message.py (1)

266-293: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fetched typed messages are not being persisted.

After moving to typed payloads, Line 291 skips non-dict messages, so API-fetched SlackMessagePayload items never reach save_slack_message. Also, json.dumps(..., default=str) at Line 267/269 can serialize models as strings, not structured message JSON.

Suggested normalization before merge/write + process all payload shapes
+def _msg_to_dict(msg) -> Optional[dict]:
+    if isinstance(msg, dict):
+        return msg
+    if hasattr(msg, "model_dump"):
+        return msg.model_dump()
+    return None
+
@@
-        merged_raw = _merge_messages_by_ts(existing_list, messages)
+        normalized_messages = [m for m in (_msg_to_dict(x) for x in messages) if m]
+        merged_raw = _merge_messages_by_ts(existing_list, normalized_messages)
         raw_payload = json.dumps(merged_raw, indent=2, default=str)
 
-        workspace_payload = json.dumps(messages, indent=2, default=str)
+        workspace_payload = json.dumps(normalized_messages, indent=2, default=str)
@@
-            for msg in messages:
-                if not isinstance(msg, dict):
-                    continue
+            for msg in messages:
                 try:
                     if _process_message(channel, msg):
                         success_count += 1
                 except Exception as e:
-                    logger.debug("Skip message %s: %s", msg.get("ts"), e)
+                    msg_ts = (
+                        msg.ts
+                        if hasattr(msg, "ts")
+                        else (msg.get("ts") if isinstance(msg, dict) else None)
+                    )
+                    logger.debug("Skip message %s: %s", msg_ts, e)
                     error_count += 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cppa_slack_tracker/sync/sync_message.py` around lines 266 - 293, The
write/processing path currently serializes and skips non-dict items causing
typed SlackMessagePayload instances to be lost: convert/normalize any model
instances in messages to plain dicts before calling _merge_messages_by_ts,
before json.dumps (avoid default=str) and before iterating to call
save_slack_message; update the logic around merged_raw, workspace_path/raw_path
writes and the for-loop that calls save_slack_message to first transform
SlackMessagePayload (or other model objects) to dict (e.g., via
.dict()/asdict()) and only then merge/write/process so all payload shapes are
persisted and save_slack_message receives structured JSON.
cppa_slack_tracker/sync/sync_user.py (1)

20-33: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Typed user payload support is blocked by dict-only filtering in the caller.

This helper now supports object payloads, but sync_users still rejects non-dicts (Line 73), so typed users from fetch_user_list get skipped as malformed.

Caller-side fix in sync_users
-    for user_data in members:
-        if not isinstance(user_data, dict):
-            logger.warning("Skipping malformed user payload: %r", user_data)
-            error_count += 1
-            continue
+    for user_data in members:
         try:
             if _process_user_info(user_data, include_bots=include_bots):
                 success_count += 1
         except Exception as e:
+            user_id = (
+                user_data.id
+                if hasattr(user_data, "id")
+                else (user_data.get("id") if isinstance(user_data, dict) else None)
+            )
             logger.warning(
                 "Failed to sync user %s: %s",
-                user_data.get("id"),
+                user_id,
                 e,
             )
             error_count += 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cppa_slack_tracker/sync/sync_user.py` around lines 20 - 33, sync_users is
filtering out non-dict user payloads before calling _process_user_info, which
prevents typed (object) payloads from being processed; update the caller
(sync_users) so it does not reject non-dict items—either remove the strict
isinstance(..., dict) check or broaden it to accept Mapping-like objects and
objects with attributes (e.g., check for hasattr(item, "get") or hasattr(item,
"is_bot")/getattr availability), and then call _process_user_info for those
items; ensure error handling around get_or_create calls remains to surface
genuine malformed payload errors while allowing typed payloads to flow into
_process_user_info.
github_activity_tracker/fetcher.py (1)

297-311: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate payload item shape before calling .get() to avoid unstructured crashes.

These loops call .get() on external items before ensuring dict shape, so malformed API items can raise AttributeError instead of surfacing as GitHubApiValidationError.

🛠️ Proposed fix
@@
-    results: list = []
+    results: list[GitHubComment] = []
@@
-        for comment in comments:
-            created_str = comment.get("created_at")
+        for comment in comments:
+            parsed = parse_comment(cast(dict[str, Any], comment), source="issue comments")
+            created_str = parsed.created_at
             if created_str:
                 try:
                     c_dt = datetime.fromisoformat(created_str.replace("Z", "+00:00"))
                     if not _in_date_range(c_dt, start_time, end_time):
                         continue
@@
-            if isinstance(comment, dict):
-                results.append(parse_comment(comment))
-            else:
-                results.append(comment)
+            results.append(parsed)
@@
-    results: list = []
+    results: list[GitHubReview] = []
@@
-        for review in reviews:
-            updated_str = review.get("updated_at") or review.get("created_at")
+        for review in reviews:
+            parsed = parse_review(cast(dict[str, Any], review), source="pr reviews")
+            updated_str = parsed.updated_at or parsed.created_at
@@
-            if isinstance(review, dict):
-                results.append(parse_review(review))
-            else:
-                results.append(review)
+            results.append(parsed)

Also applies to: 352-368, 430-439

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@github_activity_tracker/fetcher.py` around lines 297 - 311, Payload items are
being treated as dicts before validating their shape which can raise
AttributeError; update the loop that iterates over comments to check
isinstance(comment, dict) before calling comment.get("created_at") (and
similarly before parse_comment) and if the item is not a dict, raise or
propagate a GitHubApiValidationError with contextual info (include the offending
item and which field was expected) and log via logger.debug; apply the same
pattern to the other similar loops (the blocks using _in_date_range,
parse_comment and parse_* helpers) so all calls to .get() only occur after
confirming the item is a dict.
🧹 Nitpick comments (3)
core/tests/test_errors.py (1)

321-336: ⚡ Quick win

Add tests for the remaining mapped Discord validation exceptions.

classify_failure() now maps four app-specific validation types, but this file only asserts GitHub and Slack custom errors. Add assertions for StagingValidationError and DiscordLiveSyncValidationError so the mapping table is fully regression-covered.

Test additions
+def test_classify_discord_staging_validation_error():
+    from discord_activity_tracker.staging_schema import StagingValidationError
+
+    assert (
+        classify_failure(StagingValidationError("bad staging"))
+        is CollectorFailureCategory.VALIDATION
+    )
+
+
+def test_classify_discord_live_sync_validation_error():
+    from discord_activity_tracker.live_sync_schemas import DiscordLiveSyncValidationError
+
+    assert (
+        classify_failure(DiscordLiveSyncValidationError("bad live sync"))
+        is CollectorFailureCategory.VALIDATION
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/tests/test_errors.py` around lines 321 - 336, Add assertions to cover
the two missing mapped validation exceptions: call classify_failure with
StagingValidationError(...) and with DiscordLiveSyncValidationError(...), and
assert both results are CollectorFailureCategory.VALIDATION; locate the existing
tests using classify_failure and the CollectorFailureCategory.VALIDATION checks
(see test_classify_github_api_validation_error and
test_classify_slack_api_validation_error) and mirror their structure to add
tests for StagingValidationError and DiscordLiveSyncValidationError so both
mapping entries are regression-covered.
docs/service_api/cppa_user_tracker.md (1)

24-24: ⚡ Quick win

Consider tightening the user_data parameter type.

The signature uses Any, while the Slack tracker services use Union[SlackUserPayload, dict[str, Any]] (lines 16-18 in cppa_slack_tracker.md) and Discord services use similar Union patterns (lines 16-19 in discord_activity_tracker.md). Using Any defeats the type-checking benefits introduced by this PR.

💡 Suggested improvement

If the implementation can accept both typed payloads and dicts, update the signature to:

-| `get_or_create_slack_user` | user_data: Any | tuple[SlackUser, bool] | Get or create a SlackUser from Slack API user data. Returns (SlackUser, created). |
+| `get_or_create_slack_user` | user_data: Union[SlackUserPayload, dict[str, Any]] | tuple[SlackUser, bool] | Get or create a SlackUser from Slack API user data. Returns (SlackUser, created). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/service_api/cppa_user_tracker.md` at line 24, Update the parameter type
for get_or_create_slack_user from Any to a tighter Union type (e.g.,
Union[SlackUserPayload, dict[str, Any]]) so consumers get proper type checking;
add the necessary import for SlackUserPayload (or the correct payload type name
used across trackers) and ensure the return annotation remains tuple[SlackUser,
bool] to preserve behavior.
cppa_slack_tracker/models.py (1)

18-27: 💤 Low value

Inconsistent tuple formatting for enum values.

The PUBLIC_CHANNEL and PRIVATE_CHANNEL constants use multi-line parenthesized tuples, while MPIM and IM use single-line tuples. This inconsistency reduces readability.

♻️ Standardize to single-line format
-    PUBLIC_CHANNEL = (
-        "public_channel",
-        "Public channel",
-    )  # pyright: ignore[reportCallIssue]
-    PRIVATE_CHANNEL = (
-        "private_channel",
-        "Private channel",
-    )  # pyright: ignore[reportCallIssue]
+    PUBLIC_CHANNEL = "public_channel", "Public channel"  # pyright: ignore[reportCallIssue]
+    PRIVATE_CHANNEL = "private_channel", "Private channel"  # pyright: ignore[reportCallIssue]
     MPIM = "mpim", "Multi-party direct message"  # pyright: ignore[reportCallIssue]
     IM = "im", "Direct message"  # pyright: ignore[reportCallIssue]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cppa_slack_tracker/models.py` around lines 18 - 27, PUBLIC_CHANNEL and
PRIVATE_CHANNEL are formatted as multi-line tuples while MPIM and IM use
single-line tuples; make the formatting consistent by converting PUBLIC_CHANNEL
and PRIVATE_CHANNEL to single-line tuple assignments (e.g., PUBLIC_CHANNEL =
"public_channel", "Public channel") and preserve the existing pyright: ignore
comments and semantics so the constants remain identical in value and tooling
annotations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/errors.py`:
- Around line 282-290: The dynamic exception lookup in classify_failure (the
importlib.import_module(mod_name) / getattr(mod, exc_name) block) only catches
ImportError and can propagate other import-time errors; update this block to
catch all exceptions (e.g., Exception) around the import and attribute access so
classification never raises—silently ignore or log the failure and continue to
the next mapping while still returning CollectorFailureCategory.VALIDATION when
app_exc matches; ensure you reference mod_name, exc_name,
importlib.import_module, getattr, and CollectorFailureCategory.VALIDATION when
making the change.

In `@cppa_slack_tracker/fetcher.py`:
- Around line 50-53: The loop over raw API items currently skips malformed
entries (e.g., the members loop appends only when isinstance(raw, dict) and
parse_user can return None), which hides contract violations; update the
ingestion boundary to parse items directly (e.g., call parse_user(raw) and
append its result to members without silently filtering), and change parse_user
so it raises a descriptive validation exception on malformed input instead of
returning None; apply the same fail-fast pattern to the channels/messages
parsing loops in this file (replace silent checks with direct parsing and let
schema/validation errors propagate) so malformed API items surface as structured
failures.

In `@cppa_slack_tracker/services.py`:
- Around line 319-337: The logic for handling "file_comment" assigns a
placeholder user ("-1") before the existing placeholder skip check, making the
"A file was commented on" branch unreachable; update the "file_comment" branch
to first check if slack_message.user is missing and slack_message.text == "A
file was commented on" and return None early (before calling
_get_or_fetch_slack_user), otherwise call _get_or_fetch_slack_user with the real
user id, then append comment text as currently done; reference symbols: subtype
== "file_comment", slack_message.user, slack_message.text, and
_get_or_fetch_slack_user.

In `@discord_activity_tracker/services.py`:
- Around line 637-641: The code currently uses "count=d.count or 1" when
constructing DiscordReaction in seen[key], which turns a legitimate zero count
into 1; update the assignment to preserve explicit zero values by using an
explicit None check (i.e., set count to d.count when d.count is not None,
otherwise default to 1) so that DiscordReaction (and the seen dict) retains true
zero reaction counts; look for the DiscordReaction construction around seen[key]
and the d.count reference to make this change.

In `@github_activity_tracker/fetcher.py`:
- Around line 265-272: The function currently annotates its return as a plain
`list` (signature `) -> list:`) which loses static type guarantees; change the
return annotation to a concrete type (e.g., `list[dict]` or `List[Dict[str,
Any]]` depending on your typing style) and import the needed typing names
(`List`, `Dict`, `Any`) or rely on PEP 585 `list[dict]` if the project targets
Python 3.9+; apply the same concrete annotation change to the other fetch
function noted around lines 329–335 so both maintain a precise typed boundary
and preserve static flow.

In `@github_activity_tracker/sync/commits.py`:
- Around line 126-137: The current logic treats presence of author_dict (from
commit.author or commit.committer) as sufficient, but parse_github_user may
return a user without account_id causing get_or_create_github_account to receive
None; change the branch to parse the user first (use
parse_github_user(author_dict.model_dump())), then check user_info["account_id"]
is truthy before calling get_or_create_github_account, and if account_id is
missing/empty fall back to calling _commit_author_name_and_email and
get_or_create_unknown_github_account; reference author_dict, parse_github_user,
user_info, get_or_create_github_account, _commit_author_name_and_email, and
get_or_create_unknown_github_account when applying the fix.

---

Outside diff comments:
In `@cppa_slack_tracker/sync/sync_message.py`:
- Around line 266-293: The write/processing path currently serializes and skips
non-dict items causing typed SlackMessagePayload instances to be lost:
convert/normalize any model instances in messages to plain dicts before calling
_merge_messages_by_ts, before json.dumps (avoid default=str) and before
iterating to call save_slack_message; update the logic around merged_raw,
workspace_path/raw_path writes and the for-loop that calls save_slack_message to
first transform SlackMessagePayload (or other model objects) to dict (e.g., via
.dict()/asdict()) and only then merge/write/process so all payload shapes are
persisted and save_slack_message receives structured JSON.

In `@cppa_slack_tracker/sync/sync_user.py`:
- Around line 20-33: sync_users is filtering out non-dict user payloads before
calling _process_user_info, which prevents typed (object) payloads from being
processed; update the caller (sync_users) so it does not reject non-dict
items—either remove the strict isinstance(..., dict) check or broaden it to
accept Mapping-like objects and objects with attributes (e.g., check for
hasattr(item, "get") or hasattr(item, "is_bot")/getattr availability), and then
call _process_user_info for those items; ensure error handling around
get_or_create calls remains to surface genuine malformed payload errors while
allowing typed payloads to flow into _process_user_info.

In `@github_activity_tracker/fetcher.py`:
- Around line 297-311: Payload items are being treated as dicts before
validating their shape which can raise AttributeError; update the loop that
iterates over comments to check isinstance(comment, dict) before calling
comment.get("created_at") (and similarly before parse_comment) and if the item
is not a dict, raise or propagate a GitHubApiValidationError with contextual
info (include the offending item and which field was expected) and log via
logger.debug; apply the same pattern to the other similar loops (the blocks
using _in_date_range, parse_comment and parse_* helpers) so all calls to .get()
only occur after confirming the item is a dict.

---

Nitpick comments:
In `@core/tests/test_errors.py`:
- Around line 321-336: Add assertions to cover the two missing mapped validation
exceptions: call classify_failure with StagingValidationError(...) and with
DiscordLiveSyncValidationError(...), and assert both results are
CollectorFailureCategory.VALIDATION; locate the existing tests using
classify_failure and the CollectorFailureCategory.VALIDATION checks (see
test_classify_github_api_validation_error and
test_classify_slack_api_validation_error) and mirror their structure to add
tests for StagingValidationError and DiscordLiveSyncValidationError so both
mapping entries are regression-covered.

In `@cppa_slack_tracker/models.py`:
- Around line 18-27: PUBLIC_CHANNEL and PRIVATE_CHANNEL are formatted as
multi-line tuples while MPIM and IM use single-line tuples; make the formatting
consistent by converting PUBLIC_CHANNEL and PRIVATE_CHANNEL to single-line tuple
assignments (e.g., PUBLIC_CHANNEL = "public_channel", "Public channel") and
preserve the existing pyright: ignore comments and semantics so the constants
remain identical in value and tooling annotations.

In `@docs/service_api/cppa_user_tracker.md`:
- Line 24: Update the parameter type for get_or_create_slack_user from Any to a
tighter Union type (e.g., Union[SlackUserPayload, dict[str, Any]]) so consumers
get proper type checking; add the necessary import for SlackUserPayload (or the
correct payload type name used across trackers) and ensure the return annotation
remains tuple[SlackUser, bool] to preserve behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b98af81-73de-4f9c-a7bb-d12c01341071

📥 Commits

Reviewing files that changed from the base of the PR and between fcfdda0 and 9c675ce.

📒 Files selected for processing (35)
  • CHANGELOG.md
  • core/errors.py
  • core/tests/test_errors.py
  • cppa_slack_tracker/api_schemas.py
  • cppa_slack_tracker/fetcher.py
  • cppa_slack_tracker/models.py
  • cppa_slack_tracker/services.py
  • cppa_slack_tracker/sync/sync_channel.py
  • cppa_slack_tracker/sync/sync_message.py
  • cppa_slack_tracker/sync/sync_user.py
  • cppa_slack_tracker/tests/test_api_schemas.py
  • cppa_slack_tracker/tests/test_fetcher_unit.py
  • cppa_slack_tracker/tests/test_services.py
  • cppa_user_tracker/services.py
  • cppa_user_tracker/tests/test_services.py
  • discord_activity_tracker/live_sync_schemas.py
  • discord_activity_tracker/services.py
  • discord_activity_tracker/sync/messages.py
  • discord_activity_tracker/sync/utils.py
  • discord_activity_tracker/tests/test_live_sync_schemas.py
  • discord_activity_tracker/tests/test_sync_messages.py
  • discord_activity_tracker/tests/test_sync_utils.py
  • docs/service_api/cppa_slack_tracker.md
  • docs/service_api/cppa_user_tracker.md
  • docs/service_api/discord_activity_tracker.md
  • github_activity_tracker/api_schemas.py
  • github_activity_tracker/fetcher.py
  • github_activity_tracker/sync/commits.py
  • github_activity_tracker/sync/issues_and_prs.py
  • github_activity_tracker/tests/test_api_schemas.py
  • github_activity_tracker/tests/test_fetcher.py
  • github_activity_tracker/tests/test_fetcher_commits_backward.py
  • github_activity_tracker/tests/test_fetcher_issues_and_prs.py
  • github_activity_tracker/tests/test_sync_commits.py
  • pyrightconfig.json

Comment thread core/errors.py
Comment thread cppa_slack_tracker/fetcher.py
Comment thread cppa_slack_tracker/services.py
Comment thread discord_activity_tracker/services.py
Comment thread github_activity_tracker/fetcher.py Outdated
Comment thread github_activity_tracker/sync/commits.py
@wpak-ai
Copy link
Copy Markdown
Collaborator

wpak-ai commented May 26, 2026

@coderabbitai resume amd full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

@wpak-ai Sure! I'll resume automated reviews and trigger a full review of this PR right away.

✅ Actions performed

Reviews resumed.


Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cppa_slack_tracker/sync/sync_user.py (1)

21-34: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Typed payload support is incomplete in the sync loop.

_process_user_info now handles object-style payloads, but sync_users still hard-rejects non-dict entries and later assumes .get(...) in error logging. With typed fetcher payloads, users will be skipped as malformed.

Suggested fix
 def sync_users(
@@
-    for user_data in members:
-        if not isinstance(user_data, dict):
+    for user_data in members:
+        if not isinstance(user_data, dict) and not hasattr(user_data, "id"):
             logger.warning("Skipping malformed user payload: %r", user_data)
             error_count += 1
             continue
         try:
             if _process_user_info(user_data, include_bots=include_bots):
                 success_count += 1
         except Exception as e:
+            user_id = (
+                user_data.get("id")
+                if isinstance(user_data, dict)
+                else getattr(user_data, "id", None)
+            )
             logger.warning(
                 "Failed to sync user %s: %s",
-                user_data.get("id"),
+                user_id,
                 e,
             )
             error_count += 1

Also applies to: 72-76, 81-84

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cppa_slack_tracker/sync/sync_user.py` around lines 21 - 34, sync_users
currently rejects non-dict user payloads and later assumes dict methods like
.get when logging, but _process_user_info now supports object-style payloads;
update sync_users (and the user-processing block around is_bot extraction) to
accept both dict and object payloads by using a unified accessor: attempt
attribute access (e.g., user_data.is_bot) and fall back to mapping access
(user_data.get("is_bot")), and avoid hard type checks that return False for
typed payloads; ensure the same pattern is applied to other fields used later in
sync_users and in the error logging paths so logs don't call .get on objects and
object-style users are not skipped as malformed (refer to _process_user_info and
the is_bot extraction in sync_users).
discord_activity_tracker/sync/messages.py (1)

290-295: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Object-shaped messages are grouped but never persisted.

This loop still drops any non-dict payload (continue), so typed/object messages introduced by the boundary-schema migration are silently skipped and never saved.

Suggested fix
-        try:
-            for msg in messages:
-                if not isinstance(msg, dict):
-                    continue
-                try:
-                    if _process_message(channel, msg):
-                        success_count += 1
-                except Exception as e:
-                    logger.debug("Skip message %s: %s", msg.get("ts"), e)
-                    error_count += 1
+        try:
+            for msg in messages:
+                payload = msg if isinstance(msg, dict) else getattr(msg, "model_dump", lambda **_: None)(mode="python")
+                if not isinstance(payload, dict):
+                    continue
+                try:
+                    if _process_message(channel, payload):
+                        success_count += 1
+                except Exception as e:
+                    logger.debug("Skip message %s: %s", payload.get("ts"), e)
+                    error_count += 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@discord_activity_tracker/sync/messages.py` around lines 290 - 295, The loop
inside the channel sync wrapper (the function taking channel, guild_id,
since_date, full_sync) currently drops any non-dict payload with a bare
continue, so typed/object-shaped messages from the boundary-schema migration are
never persisted; change the loop to detect non-dict payloads and normalize them
before persisting (for example, if payload has a to_dict()/dict() method call
that to serialize it, otherwise wrap it in a stable envelope like
{"_type":"object","value":payload} or convert to string) instead of continuing,
and remove the unconditional continue so all messages are saved; update the code
paths that later assume dict payloads to handle the new envelope format.
♻️ Duplicate comments (1)
cppa_slack_tracker/fetcher.py (1)

75-78: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently return None for malformed users.info payloads.

When Slack returns ok=True but user is malformed, this path suppresses boundary validation instead of surfacing a structured schema error.

Proposed fix
-    user = data.get("user")
-    if not isinstance(user, dict):
-        return None
-    return parse_user(user)
+    return parse_user(data.get("user"), source=f"users.info user_id={user_id}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cppa_slack_tracker/fetcher.py` around lines 75 - 78, The code handling the
Slack users.info response currently returns None when data.get("user") is not a
dict, which hides schema problems; instead, replace the silent "return None"
with a raised structured schema error (e.g., raise SchemaValidationError(...) or
ValueError if no schema error class exists) that includes a clear message and
the raw payload (the local variable data) so callers can see the malformed
response; update the function surrounding this snippet (the code that calls
parse_user and references user) to raise this error rather than returning None
and ensure any existing SchemaValidationError type is imported or created if
needed.
🧹 Nitpick comments (4)
docs/service_api/cppa_user_tracker.md (1)

24-24: 💤 Low value

Documentation type inconsistency with Slack tracker pattern.

The user_data parameter is documented as Any, while the related Slack service functions in cppa_slack_tracker.md (lines 16-18) use the more specific Union[SlackUserPayload, dict[str, Any]] pattern. For consistency and better type information, consider documenting this as Union[SlackUserPayload, dict[str, Any]] unless the implementation intentionally differs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/service_api/cppa_user_tracker.md` at line 24, The docstring for
get_or_create_slack_user uses a vague type Any for user_data; change its
documented type to match the Slack tracker pattern by using
Union[SlackUserPayload, dict[str, Any]] so callers get precise typing and
consistency with cppa_slack_tracker functions; update the signature entry for
get_or_create_slack_user to list user_data: Union[SlackUserPayload, dict[str,
Any]] and adjust any surrounding examples or parameter descriptions to reflect
the more specific type.
cppa_slack_tracker/services.py (1)

348-354: 💤 Low value

Consider using getattr with a default for cleaner edited timestamp extraction.

The current isinstance + hasattr chain works but is verbose. Since edited can be dict, a typed object with .ts, or None, a unified approach would be cleaner.

Suggested simplification
     edited = slack_message.edited
-    edited_ts = None
-    if isinstance(edited, dict):
-        edited_ts = edited.get("ts")
-    elif edited is not None and hasattr(edited, "ts"):
-        edited_ts = edited.ts
+    edited_ts = (
+        edited.get("ts") if isinstance(edited, dict)
+        else getattr(edited, "ts", None)
+    )
     updated_at = _parse_slack_ts_string(edited_ts or ts) if edited_ts else created_at
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cppa_slack_tracker/services.py` around lines 348 - 354, Replace the verbose
edited-timestamp extraction with a unified getattr/get pattern: for the variable
edited (from slack_message.edited) retrieve edited_ts by first checking dict via
edited.get("ts") else use getattr(edited, "ts", None) so edited_ts becomes the
ts value or None, then compute updated_at exactly as before with
_parse_slack_ts_string(edited_ts or ts) if edited_ts else created_at; update the
block that defines edited, edited_ts, and updated_at accordingly to use this
simplified logic.
github_activity_tracker/fetcher.py (2)

310-313: 💤 Low value

Unreachable else branch creates type inconsistency.

The API response from client.rest_request() returns JSON, so each comment in the list will always be a dict. The else branch at line 312-313 appends comment as-is (presumably already a GitHubComment), but this path is unreachable from the API. If this defensive code is intentional for future flexibility (e.g., pre-parsed inputs), the function signature should reflect list[dict | GitHubComment] as input. Otherwise, remove the dead branch.

Suggested simplification
-            if isinstance(comment, dict):
-                results.append(parse_comment(comment))
-            else:
-                results.append(comment)
+            results.append(parse_comment(comment))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@github_activity_tracker/fetcher.py` around lines 310 - 313, The else branch
that appends comment as-is is unreachable because comments come from
client.rest_request() as dicts; remove the unreachable else branch and always
call parse_comment(comment) when building results (refer to parse_comment and
the loop over comment), and update the function's type hints to accept
list[dict] for input and return the appropriate parsed type; alternatively, if
you intended to accept already-parsed GitHubComment objects, change the
parameter/type annotation to list[dict | GitHubComment] and keep a runtime
isinstance check—do one of these two consistent fixes so parse_comment,
client.rest_request(), and GitHubComment usage line up.

367-370: 💤 Low value

Same unreachable else branch pattern for reviews.

Identical to the comment-parsing logic: the else branch is dead code since API responses are always dicts.

Suggested simplification
-            if isinstance(review, dict):
-                results.append(parse_review(review))
-            else:
-                results.append(review)
+            results.append(parse_review(review))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@github_activity_tracker/fetcher.py` around lines 367 - 370, The else branch
after handling reviews is dead code because API returns dicts; in the block that
checks "if isinstance(review, dict): results.append(parse_review(review)) else:
results.append(review)" remove the unreachable else and always call
results.append(parse_review(review)) (update where reviews are processed in
fetcher.py and any helper that calls parse_review to reflect the simplified
flow).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cppa_slack_tracker/services.py`:
- Around line 139-146: The current fallback logic lets topic.value overwrite a
non-empty description because the branch "elif isinstance(topic,
SlackTopicPurpose)" runs even when description was set from purpose; change the
topic-handling branch so it only assigns when description is empty — e.g.,
replace the "elif isinstance(topic, SlackTopicPurpose)" with a conditional that
also checks "not description" (so use "if not description and isinstance(topic,
SlackTopicPurpose)") or otherwise ensure both topic dict and SlackTopicPurpose
branches require description to be falsy before assigning; adjust the logic
around the variables purpose, topic, SlackTopicPurpose, and description
accordingly.

In `@github_activity_tracker/api_schemas.py`:
- Around line 59-60: The schemas currently allow required identifiers to be
optional; update GitHubIssue.number, GitHubPullRequest.number, and
GitHubCommit.sha in api_schemas.py to be required by removing the optional union
and any default of None (i.e., change from "int | None = None" or "str | None =
None" to a plain non-optional type with no default), so validation enforces
presence at the ingestion boundary; adjust the three occurrences referenced
(GitHubIssue.number, GitHubPullRequest.number, GitHubCommit.sha) accordingly and
run schema/validation tests.
- Around line 144-152: The validator's _normalize logic currently double-wraps
already-normalized payloads like {"issue": {...}} or {"pr": {...}}; update
_normalize to first check if data contains an "issue" or "pr" key whose value is
a dict and, if so, ensure that its "comments" field exists and is a list (coerce
to [] if missing or not a list) and return {"issue": issue_dict} or {"pr":
pr_dict} directly instead of nesting; apply the same check-and-coerce pattern to
the code paths handling issue_info/pr_info (the branches that reference
issue_info, pr_info and the alternate blocks around the lines noted) so both
normalized and raw payloads are handled consistently.

---

Outside diff comments:
In `@cppa_slack_tracker/sync/sync_user.py`:
- Around line 21-34: sync_users currently rejects non-dict user payloads and
later assumes dict methods like .get when logging, but _process_user_info now
supports object-style payloads; update sync_users (and the user-processing block
around is_bot extraction) to accept both dict and object payloads by using a
unified accessor: attempt attribute access (e.g., user_data.is_bot) and fall
back to mapping access (user_data.get("is_bot")), and avoid hard type checks
that return False for typed payloads; ensure the same pattern is applied to
other fields used later in sync_users and in the error logging paths so logs
don't call .get on objects and object-style users are not skipped as malformed
(refer to _process_user_info and the is_bot extraction in sync_users).

In `@discord_activity_tracker/sync/messages.py`:
- Around line 290-295: The loop inside the channel sync wrapper (the function
taking channel, guild_id, since_date, full_sync) currently drops any non-dict
payload with a bare continue, so typed/object-shaped messages from the
boundary-schema migration are never persisted; change the loop to detect
non-dict payloads and normalize them before persisting (for example, if payload
has a to_dict()/dict() method call that to serialize it, otherwise wrap it in a
stable envelope like {"_type":"object","value":payload} or convert to string)
instead of continuing, and remove the unconditional continue so all messages are
saved; update the code paths that later assume dict payloads to handle the new
envelope format.

---

Duplicate comments:
In `@cppa_slack_tracker/fetcher.py`:
- Around line 75-78: The code handling the Slack users.info response currently
returns None when data.get("user") is not a dict, which hides schema problems;
instead, replace the silent "return None" with a raised structured schema error
(e.g., raise SchemaValidationError(...) or ValueError if no schema error class
exists) that includes a clear message and the raw payload (the local variable
data) so callers can see the malformed response; update the function surrounding
this snippet (the code that calls parse_user and references user) to raise this
error rather than returning None and ensure any existing SchemaValidationError
type is imported or created if needed.

---

Nitpick comments:
In `@cppa_slack_tracker/services.py`:
- Around line 348-354: Replace the verbose edited-timestamp extraction with a
unified getattr/get pattern: for the variable edited (from slack_message.edited)
retrieve edited_ts by first checking dict via edited.get("ts") else use
getattr(edited, "ts", None) so edited_ts becomes the ts value or None, then
compute updated_at exactly as before with _parse_slack_ts_string(edited_ts or
ts) if edited_ts else created_at; update the block that defines edited,
edited_ts, and updated_at accordingly to use this simplified logic.

In `@docs/service_api/cppa_user_tracker.md`:
- Line 24: The docstring for get_or_create_slack_user uses a vague type Any for
user_data; change its documented type to match the Slack tracker pattern by
using Union[SlackUserPayload, dict[str, Any]] so callers get precise typing and
consistency with cppa_slack_tracker functions; update the signature entry for
get_or_create_slack_user to list user_data: Union[SlackUserPayload, dict[str,
Any]] and adjust any surrounding examples or parameter descriptions to reflect
the more specific type.

In `@github_activity_tracker/fetcher.py`:
- Around line 310-313: The else branch that appends comment as-is is unreachable
because comments come from client.rest_request() as dicts; remove the
unreachable else branch and always call parse_comment(comment) when building
results (refer to parse_comment and the loop over comment), and update the
function's type hints to accept list[dict] for input and return the appropriate
parsed type; alternatively, if you intended to accept already-parsed
GitHubComment objects, change the parameter/type annotation to list[dict |
GitHubComment] and keep a runtime isinstance check—do one of these two
consistent fixes so parse_comment, client.rest_request(), and GitHubComment
usage line up.
- Around line 367-370: The else branch after handling reviews is dead code
because API returns dicts; in the block that checks "if isinstance(review,
dict): results.append(parse_review(review)) else: results.append(review)" remove
the unreachable else and always call results.append(parse_review(review))
(update where reviews are processed in fetcher.py and any helper that calls
parse_review to reflect the simplified flow).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3159672-857b-416e-9f8b-d4129b7810fb

📥 Commits

Reviewing files that changed from the base of the PR and between fcfdda0 and db6b354.

📒 Files selected for processing (35)
  • CHANGELOG.md
  • core/errors.py
  • core/tests/test_errors.py
  • cppa_slack_tracker/api_schemas.py
  • cppa_slack_tracker/fetcher.py
  • cppa_slack_tracker/models.py
  • cppa_slack_tracker/services.py
  • cppa_slack_tracker/sync/sync_channel.py
  • cppa_slack_tracker/sync/sync_message.py
  • cppa_slack_tracker/sync/sync_user.py
  • cppa_slack_tracker/tests/test_api_schemas.py
  • cppa_slack_tracker/tests/test_fetcher_unit.py
  • cppa_slack_tracker/tests/test_services.py
  • cppa_user_tracker/services.py
  • cppa_user_tracker/tests/test_services.py
  • discord_activity_tracker/live_sync_schemas.py
  • discord_activity_tracker/services.py
  • discord_activity_tracker/sync/messages.py
  • discord_activity_tracker/sync/utils.py
  • discord_activity_tracker/tests/test_live_sync_schemas.py
  • discord_activity_tracker/tests/test_sync_messages.py
  • discord_activity_tracker/tests/test_sync_utils.py
  • docs/service_api/cppa_slack_tracker.md
  • docs/service_api/cppa_user_tracker.md
  • docs/service_api/discord_activity_tracker.md
  • github_activity_tracker/api_schemas.py
  • github_activity_tracker/fetcher.py
  • github_activity_tracker/sync/commits.py
  • github_activity_tracker/sync/issues_and_prs.py
  • github_activity_tracker/tests/test_api_schemas.py
  • github_activity_tracker/tests/test_fetcher.py
  • github_activity_tracker/tests/test_fetcher_commits_backward.py
  • github_activity_tracker/tests/test_fetcher_issues_and_prs.py
  • github_activity_tracker/tests/test_sync_commits.py
  • pyrightconfig.json

Comment thread cppa_slack_tracker/services.py Outdated
Comment thread github_activity_tracker/api_schemas.py Outdated
Comment thread github_activity_tracker/api_schemas.py
Comment thread CHANGELOG.md
Comment thread discord_activity_tracker/live_sync_schemas.py Outdated
@jonathanMLDev jonathanMLDev requested a review from snowfox1003 May 27, 2026 14:28
@wpak-ai wpak-ai merged commit d416a19 into cppalliance:develop May 27, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Boundary Schemas for API Ingestion Points Replacing Raw Dicts

3 participants