Add API boundary schemas#241
Conversation
|
Warning Review limit reached
More reviews will be available in 35 minutes and 11 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (13)
📝 WalkthroughWalkthroughAdds Pydantic v2 ingestion-boundary schemas and parse helpers for GitHub, Slack, and Discord; fetchers now model_validate API responses, services accept typed payloads or dicts (normalized), sync pipelines handle both shapes, classify validation failures as VALIDATION, and tests/docs/pyright were updated. ChangesPydantic Boundary Schemas and Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cppa_slack_tracker/sync/sync_message.py (1)
266-293:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFetched typed messages are not being persisted.
After moving to typed payloads, Line 291 skips non-dict messages, so API-fetched
SlackMessagePayloaditems never reachsave_slack_message. Also,json.dumps(..., default=str)at Line 267/269 can serialize models as strings, not structured message JSON.Suggested normalization before merge/write + process all payload shapes
+def _msg_to_dict(msg) -> Optional[dict]: + if isinstance(msg, dict): + return msg + if hasattr(msg, "model_dump"): + return msg.model_dump() + return None + @@ - merged_raw = _merge_messages_by_ts(existing_list, messages) + normalized_messages = [m for m in (_msg_to_dict(x) for x in messages) if m] + merged_raw = _merge_messages_by_ts(existing_list, normalized_messages) raw_payload = json.dumps(merged_raw, indent=2, default=str) - workspace_payload = json.dumps(messages, indent=2, default=str) + workspace_payload = json.dumps(normalized_messages, indent=2, default=str) @@ - for msg in messages: - if not isinstance(msg, dict): - continue + for msg in messages: try: if _process_message(channel, msg): success_count += 1 except Exception as e: - logger.debug("Skip message %s: %s", msg.get("ts"), e) + msg_ts = ( + msg.ts + if hasattr(msg, "ts") + else (msg.get("ts") if isinstance(msg, dict) else None) + ) + logger.debug("Skip message %s: %s", msg_ts, e) error_count += 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cppa_slack_tracker/sync/sync_message.py` around lines 266 - 293, The write/processing path currently serializes and skips non-dict items causing typed SlackMessagePayload instances to be lost: convert/normalize any model instances in messages to plain dicts before calling _merge_messages_by_ts, before json.dumps (avoid default=str) and before iterating to call save_slack_message; update the logic around merged_raw, workspace_path/raw_path writes and the for-loop that calls save_slack_message to first transform SlackMessagePayload (or other model objects) to dict (e.g., via .dict()/asdict()) and only then merge/write/process so all payload shapes are persisted and save_slack_message receives structured JSON.cppa_slack_tracker/sync/sync_user.py (1)
20-33:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winTyped user payload support is blocked by dict-only filtering in the caller.
This helper now supports object payloads, but
sync_usersstill rejects non-dicts (Line 73), so typed users fromfetch_user_listget skipped as malformed.Caller-side fix in
sync_users- for user_data in members: - if not isinstance(user_data, dict): - logger.warning("Skipping malformed user payload: %r", user_data) - error_count += 1 - continue + for user_data in members: try: if _process_user_info(user_data, include_bots=include_bots): success_count += 1 except Exception as e: + user_id = ( + user_data.id + if hasattr(user_data, "id") + else (user_data.get("id") if isinstance(user_data, dict) else None) + ) logger.warning( "Failed to sync user %s: %s", - user_data.get("id"), + user_id, e, ) error_count += 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cppa_slack_tracker/sync/sync_user.py` around lines 20 - 33, sync_users is filtering out non-dict user payloads before calling _process_user_info, which prevents typed (object) payloads from being processed; update the caller (sync_users) so it does not reject non-dict items—either remove the strict isinstance(..., dict) check or broaden it to accept Mapping-like objects and objects with attributes (e.g., check for hasattr(item, "get") or hasattr(item, "is_bot")/getattr availability), and then call _process_user_info for those items; ensure error handling around get_or_create calls remains to surface genuine malformed payload errors while allowing typed payloads to flow into _process_user_info.github_activity_tracker/fetcher.py (1)
297-311:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate payload item shape before calling
.get()to avoid unstructured crashes.These loops call
.get()on external items before ensuring dict shape, so malformed API items can raiseAttributeErrorinstead of surfacing asGitHubApiValidationError.🛠️ Proposed fix
@@ - results: list = [] + results: list[GitHubComment] = [] @@ - for comment in comments: - created_str = comment.get("created_at") + for comment in comments: + parsed = parse_comment(cast(dict[str, Any], comment), source="issue comments") + created_str = parsed.created_at if created_str: try: c_dt = datetime.fromisoformat(created_str.replace("Z", "+00:00")) if not _in_date_range(c_dt, start_time, end_time): continue @@ - if isinstance(comment, dict): - results.append(parse_comment(comment)) - else: - results.append(comment) + results.append(parsed) @@ - results: list = [] + results: list[GitHubReview] = [] @@ - for review in reviews: - updated_str = review.get("updated_at") or review.get("created_at") + for review in reviews: + parsed = parse_review(cast(dict[str, Any], review), source="pr reviews") + updated_str = parsed.updated_at or parsed.created_at @@ - if isinstance(review, dict): - results.append(parse_review(review)) - else: - results.append(review) + results.append(parsed)Also applies to: 352-368, 430-439
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@github_activity_tracker/fetcher.py` around lines 297 - 311, Payload items are being treated as dicts before validating their shape which can raise AttributeError; update the loop that iterates over comments to check isinstance(comment, dict) before calling comment.get("created_at") (and similarly before parse_comment) and if the item is not a dict, raise or propagate a GitHubApiValidationError with contextual info (include the offending item and which field was expected) and log via logger.debug; apply the same pattern to the other similar loops (the blocks using _in_date_range, parse_comment and parse_* helpers) so all calls to .get() only occur after confirming the item is a dict.
🧹 Nitpick comments (3)
core/tests/test_errors.py (1)
321-336: ⚡ Quick winAdd tests for the remaining mapped Discord validation exceptions.
classify_failure()now maps four app-specific validation types, but this file only asserts GitHub and Slack custom errors. Add assertions forStagingValidationErrorandDiscordLiveSyncValidationErrorso the mapping table is fully regression-covered.Test additions
+def test_classify_discord_staging_validation_error(): + from discord_activity_tracker.staging_schema import StagingValidationError + + assert ( + classify_failure(StagingValidationError("bad staging")) + is CollectorFailureCategory.VALIDATION + ) + + +def test_classify_discord_live_sync_validation_error(): + from discord_activity_tracker.live_sync_schemas import DiscordLiveSyncValidationError + + assert ( + classify_failure(DiscordLiveSyncValidationError("bad live sync")) + is CollectorFailureCategory.VALIDATION + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/tests/test_errors.py` around lines 321 - 336, Add assertions to cover the two missing mapped validation exceptions: call classify_failure with StagingValidationError(...) and with DiscordLiveSyncValidationError(...), and assert both results are CollectorFailureCategory.VALIDATION; locate the existing tests using classify_failure and the CollectorFailureCategory.VALIDATION checks (see test_classify_github_api_validation_error and test_classify_slack_api_validation_error) and mirror their structure to add tests for StagingValidationError and DiscordLiveSyncValidationError so both mapping entries are regression-covered.docs/service_api/cppa_user_tracker.md (1)
24-24: ⚡ Quick winConsider tightening the
user_dataparameter type.The signature uses
Any, while the Slack tracker services useUnion[SlackUserPayload, dict[str, Any]](lines 16-18 incppa_slack_tracker.md) and Discord services use similar Union patterns (lines 16-19 indiscord_activity_tracker.md). UsingAnydefeats the type-checking benefits introduced by this PR.💡 Suggested improvement
If the implementation can accept both typed payloads and dicts, update the signature to:
-| `get_or_create_slack_user` | user_data: Any | tuple[SlackUser, bool] | Get or create a SlackUser from Slack API user data. Returns (SlackUser, created). | +| `get_or_create_slack_user` | user_data: Union[SlackUserPayload, dict[str, Any]] | tuple[SlackUser, bool] | Get or create a SlackUser from Slack API user data. Returns (SlackUser, created). |🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/service_api/cppa_user_tracker.md` at line 24, Update the parameter type for get_or_create_slack_user from Any to a tighter Union type (e.g., Union[SlackUserPayload, dict[str, Any]]) so consumers get proper type checking; add the necessary import for SlackUserPayload (or the correct payload type name used across trackers) and ensure the return annotation remains tuple[SlackUser, bool] to preserve behavior.cppa_slack_tracker/models.py (1)
18-27: 💤 Low valueInconsistent tuple formatting for enum values.
The
PUBLIC_CHANNELandPRIVATE_CHANNELconstants use multi-line parenthesized tuples, whileMPIMandIMuse single-line tuples. This inconsistency reduces readability.♻️ Standardize to single-line format
- PUBLIC_CHANNEL = ( - "public_channel", - "Public channel", - ) # pyright: ignore[reportCallIssue] - PRIVATE_CHANNEL = ( - "private_channel", - "Private channel", - ) # pyright: ignore[reportCallIssue] + PUBLIC_CHANNEL = "public_channel", "Public channel" # pyright: ignore[reportCallIssue] + PRIVATE_CHANNEL = "private_channel", "Private channel" # pyright: ignore[reportCallIssue] MPIM = "mpim", "Multi-party direct message" # pyright: ignore[reportCallIssue] IM = "im", "Direct message" # pyright: ignore[reportCallIssue]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cppa_slack_tracker/models.py` around lines 18 - 27, PUBLIC_CHANNEL and PRIVATE_CHANNEL are formatted as multi-line tuples while MPIM and IM use single-line tuples; make the formatting consistent by converting PUBLIC_CHANNEL and PRIVATE_CHANNEL to single-line tuple assignments (e.g., PUBLIC_CHANNEL = "public_channel", "Public channel") and preserve the existing pyright: ignore comments and semantics so the constants remain identical in value and tooling annotations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/errors.py`:
- Around line 282-290: The dynamic exception lookup in classify_failure (the
importlib.import_module(mod_name) / getattr(mod, exc_name) block) only catches
ImportError and can propagate other import-time errors; update this block to
catch all exceptions (e.g., Exception) around the import and attribute access so
classification never raises—silently ignore or log the failure and continue to
the next mapping while still returning CollectorFailureCategory.VALIDATION when
app_exc matches; ensure you reference mod_name, exc_name,
importlib.import_module, getattr, and CollectorFailureCategory.VALIDATION when
making the change.
In `@cppa_slack_tracker/fetcher.py`:
- Around line 50-53: The loop over raw API items currently skips malformed
entries (e.g., the members loop appends only when isinstance(raw, dict) and
parse_user can return None), which hides contract violations; update the
ingestion boundary to parse items directly (e.g., call parse_user(raw) and
append its result to members without silently filtering), and change parse_user
so it raises a descriptive validation exception on malformed input instead of
returning None; apply the same fail-fast pattern to the channels/messages
parsing loops in this file (replace silent checks with direct parsing and let
schema/validation errors propagate) so malformed API items surface as structured
failures.
In `@cppa_slack_tracker/services.py`:
- Around line 319-337: The logic for handling "file_comment" assigns a
placeholder user ("-1") before the existing placeholder skip check, making the
"A file was commented on" branch unreachable; update the "file_comment" branch
to first check if slack_message.user is missing and slack_message.text == "A
file was commented on" and return None early (before calling
_get_or_fetch_slack_user), otherwise call _get_or_fetch_slack_user with the real
user id, then append comment text as currently done; reference symbols: subtype
== "file_comment", slack_message.user, slack_message.text, and
_get_or_fetch_slack_user.
In `@discord_activity_tracker/services.py`:
- Around line 637-641: The code currently uses "count=d.count or 1" when
constructing DiscordReaction in seen[key], which turns a legitimate zero count
into 1; update the assignment to preserve explicit zero values by using an
explicit None check (i.e., set count to d.count when d.count is not None,
otherwise default to 1) so that DiscordReaction (and the seen dict) retains true
zero reaction counts; look for the DiscordReaction construction around seen[key]
and the d.count reference to make this change.
In `@github_activity_tracker/fetcher.py`:
- Around line 265-272: The function currently annotates its return as a plain
`list` (signature `) -> list:`) which loses static type guarantees; change the
return annotation to a concrete type (e.g., `list[dict]` or `List[Dict[str,
Any]]` depending on your typing style) and import the needed typing names
(`List`, `Dict`, `Any`) or rely on PEP 585 `list[dict]` if the project targets
Python 3.9+; apply the same concrete annotation change to the other fetch
function noted around lines 329–335 so both maintain a precise typed boundary
and preserve static flow.
In `@github_activity_tracker/sync/commits.py`:
- Around line 126-137: The current logic treats presence of author_dict (from
commit.author or commit.committer) as sufficient, but parse_github_user may
return a user without account_id causing get_or_create_github_account to receive
None; change the branch to parse the user first (use
parse_github_user(author_dict.model_dump())), then check user_info["account_id"]
is truthy before calling get_or_create_github_account, and if account_id is
missing/empty fall back to calling _commit_author_name_and_email and
get_or_create_unknown_github_account; reference author_dict, parse_github_user,
user_info, get_or_create_github_account, _commit_author_name_and_email, and
get_or_create_unknown_github_account when applying the fix.
---
Outside diff comments:
In `@cppa_slack_tracker/sync/sync_message.py`:
- Around line 266-293: The write/processing path currently serializes and skips
non-dict items causing typed SlackMessagePayload instances to be lost:
convert/normalize any model instances in messages to plain dicts before calling
_merge_messages_by_ts, before json.dumps (avoid default=str) and before
iterating to call save_slack_message; update the logic around merged_raw,
workspace_path/raw_path writes and the for-loop that calls save_slack_message to
first transform SlackMessagePayload (or other model objects) to dict (e.g., via
.dict()/asdict()) and only then merge/write/process so all payload shapes are
persisted and save_slack_message receives structured JSON.
In `@cppa_slack_tracker/sync/sync_user.py`:
- Around line 20-33: sync_users is filtering out non-dict user payloads before
calling _process_user_info, which prevents typed (object) payloads from being
processed; update the caller (sync_users) so it does not reject non-dict
items—either remove the strict isinstance(..., dict) check or broaden it to
accept Mapping-like objects and objects with attributes (e.g., check for
hasattr(item, "get") or hasattr(item, "is_bot")/getattr availability), and then
call _process_user_info for those items; ensure error handling around
get_or_create calls remains to surface genuine malformed payload errors while
allowing typed payloads to flow into _process_user_info.
In `@github_activity_tracker/fetcher.py`:
- Around line 297-311: Payload items are being treated as dicts before
validating their shape which can raise AttributeError; update the loop that
iterates over comments to check isinstance(comment, dict) before calling
comment.get("created_at") (and similarly before parse_comment) and if the item
is not a dict, raise or propagate a GitHubApiValidationError with contextual
info (include the offending item and which field was expected) and log via
logger.debug; apply the same pattern to the other similar loops (the blocks
using _in_date_range, parse_comment and parse_* helpers) so all calls to .get()
only occur after confirming the item is a dict.
---
Nitpick comments:
In `@core/tests/test_errors.py`:
- Around line 321-336: Add assertions to cover the two missing mapped validation
exceptions: call classify_failure with StagingValidationError(...) and with
DiscordLiveSyncValidationError(...), and assert both results are
CollectorFailureCategory.VALIDATION; locate the existing tests using
classify_failure and the CollectorFailureCategory.VALIDATION checks (see
test_classify_github_api_validation_error and
test_classify_slack_api_validation_error) and mirror their structure to add
tests for StagingValidationError and DiscordLiveSyncValidationError so both
mapping entries are regression-covered.
In `@cppa_slack_tracker/models.py`:
- Around line 18-27: PUBLIC_CHANNEL and PRIVATE_CHANNEL are formatted as
multi-line tuples while MPIM and IM use single-line tuples; make the formatting
consistent by converting PUBLIC_CHANNEL and PRIVATE_CHANNEL to single-line tuple
assignments (e.g., PUBLIC_CHANNEL = "public_channel", "Public channel") and
preserve the existing pyright: ignore comments and semantics so the constants
remain identical in value and tooling annotations.
In `@docs/service_api/cppa_user_tracker.md`:
- Line 24: Update the parameter type for get_or_create_slack_user from Any to a
tighter Union type (e.g., Union[SlackUserPayload, dict[str, Any]]) so consumers
get proper type checking; add the necessary import for SlackUserPayload (or the
correct payload type name used across trackers) and ensure the return annotation
remains tuple[SlackUser, bool] to preserve behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1b98af81-73de-4f9c-a7bb-d12c01341071
📒 Files selected for processing (35)
CHANGELOG.mdcore/errors.pycore/tests/test_errors.pycppa_slack_tracker/api_schemas.pycppa_slack_tracker/fetcher.pycppa_slack_tracker/models.pycppa_slack_tracker/services.pycppa_slack_tracker/sync/sync_channel.pycppa_slack_tracker/sync/sync_message.pycppa_slack_tracker/sync/sync_user.pycppa_slack_tracker/tests/test_api_schemas.pycppa_slack_tracker/tests/test_fetcher_unit.pycppa_slack_tracker/tests/test_services.pycppa_user_tracker/services.pycppa_user_tracker/tests/test_services.pydiscord_activity_tracker/live_sync_schemas.pydiscord_activity_tracker/services.pydiscord_activity_tracker/sync/messages.pydiscord_activity_tracker/sync/utils.pydiscord_activity_tracker/tests/test_live_sync_schemas.pydiscord_activity_tracker/tests/test_sync_messages.pydiscord_activity_tracker/tests/test_sync_utils.pydocs/service_api/cppa_slack_tracker.mddocs/service_api/cppa_user_tracker.mddocs/service_api/discord_activity_tracker.mdgithub_activity_tracker/api_schemas.pygithub_activity_tracker/fetcher.pygithub_activity_tracker/sync/commits.pygithub_activity_tracker/sync/issues_and_prs.pygithub_activity_tracker/tests/test_api_schemas.pygithub_activity_tracker/tests/test_fetcher.pygithub_activity_tracker/tests/test_fetcher_commits_backward.pygithub_activity_tracker/tests/test_fetcher_issues_and_prs.pygithub_activity_tracker/tests/test_sync_commits.pypyrightconfig.json
|
@coderabbitai resume amd full review |
|
✅ Actions performedReviews resumed. Full review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cppa_slack_tracker/sync/sync_user.py (1)
21-34:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winTyped payload support is incomplete in the sync loop.
_process_user_infonow handles object-style payloads, butsync_usersstill hard-rejects non-dictentries and later assumes.get(...)in error logging. With typed fetcher payloads, users will be skipped as malformed.Suggested fix
def sync_users( @@ - for user_data in members: - if not isinstance(user_data, dict): + for user_data in members: + if not isinstance(user_data, dict) and not hasattr(user_data, "id"): logger.warning("Skipping malformed user payload: %r", user_data) error_count += 1 continue try: if _process_user_info(user_data, include_bots=include_bots): success_count += 1 except Exception as e: + user_id = ( + user_data.get("id") + if isinstance(user_data, dict) + else getattr(user_data, "id", None) + ) logger.warning( "Failed to sync user %s: %s", - user_data.get("id"), + user_id, e, ) error_count += 1Also applies to: 72-76, 81-84
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cppa_slack_tracker/sync/sync_user.py` around lines 21 - 34, sync_users currently rejects non-dict user payloads and later assumes dict methods like .get when logging, but _process_user_info now supports object-style payloads; update sync_users (and the user-processing block around is_bot extraction) to accept both dict and object payloads by using a unified accessor: attempt attribute access (e.g., user_data.is_bot) and fall back to mapping access (user_data.get("is_bot")), and avoid hard type checks that return False for typed payloads; ensure the same pattern is applied to other fields used later in sync_users and in the error logging paths so logs don't call .get on objects and object-style users are not skipped as malformed (refer to _process_user_info and the is_bot extraction in sync_users).discord_activity_tracker/sync/messages.py (1)
290-295:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winObject-shaped messages are grouped but never persisted.
This loop still drops any non-dict payload (
continue), so typed/object messages introduced by the boundary-schema migration are silently skipped and never saved.Suggested fix
- try: - for msg in messages: - if not isinstance(msg, dict): - continue - try: - if _process_message(channel, msg): - success_count += 1 - except Exception as e: - logger.debug("Skip message %s: %s", msg.get("ts"), e) - error_count += 1 + try: + for msg in messages: + payload = msg if isinstance(msg, dict) else getattr(msg, "model_dump", lambda **_: None)(mode="python") + if not isinstance(payload, dict): + continue + try: + if _process_message(channel, payload): + success_count += 1 + except Exception as e: + logger.debug("Skip message %s: %s", payload.get("ts"), e) + error_count += 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/sync/messages.py` around lines 290 - 295, The loop inside the channel sync wrapper (the function taking channel, guild_id, since_date, full_sync) currently drops any non-dict payload with a bare continue, so typed/object-shaped messages from the boundary-schema migration are never persisted; change the loop to detect non-dict payloads and normalize them before persisting (for example, if payload has a to_dict()/dict() method call that to serialize it, otherwise wrap it in a stable envelope like {"_type":"object","value":payload} or convert to string) instead of continuing, and remove the unconditional continue so all messages are saved; update the code paths that later assume dict payloads to handle the new envelope format.
♻️ Duplicate comments (1)
cppa_slack_tracker/fetcher.py (1)
75-78:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t silently return
Nonefor malformedusers.infopayloads.When Slack returns
ok=Truebutuseris malformed, this path suppresses boundary validation instead of surfacing a structured schema error.Proposed fix
- user = data.get("user") - if not isinstance(user, dict): - return None - return parse_user(user) + return parse_user(data.get("user"), source=f"users.info user_id={user_id}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cppa_slack_tracker/fetcher.py` around lines 75 - 78, The code handling the Slack users.info response currently returns None when data.get("user") is not a dict, which hides schema problems; instead, replace the silent "return None" with a raised structured schema error (e.g., raise SchemaValidationError(...) or ValueError if no schema error class exists) that includes a clear message and the raw payload (the local variable data) so callers can see the malformed response; update the function surrounding this snippet (the code that calls parse_user and references user) to raise this error rather than returning None and ensure any existing SchemaValidationError type is imported or created if needed.
🧹 Nitpick comments (4)
docs/service_api/cppa_user_tracker.md (1)
24-24: 💤 Low valueDocumentation type inconsistency with Slack tracker pattern.
The
user_dataparameter is documented asAny, while the related Slack service functions incppa_slack_tracker.md(lines 16-18) use the more specificUnion[SlackUserPayload, dict[str, Any]]pattern. For consistency and better type information, consider documenting this asUnion[SlackUserPayload, dict[str, Any]]unless the implementation intentionally differs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/service_api/cppa_user_tracker.md` at line 24, The docstring for get_or_create_slack_user uses a vague type Any for user_data; change its documented type to match the Slack tracker pattern by using Union[SlackUserPayload, dict[str, Any]] so callers get precise typing and consistency with cppa_slack_tracker functions; update the signature entry for get_or_create_slack_user to list user_data: Union[SlackUserPayload, dict[str, Any]] and adjust any surrounding examples or parameter descriptions to reflect the more specific type.cppa_slack_tracker/services.py (1)
348-354: 💤 Low valueConsider using
getattrwith a default for cleaner edited timestamp extraction.The current
isinstance+hasattrchain works but is verbose. Sinceeditedcan bedict, a typed object with.ts, orNone, a unified approach would be cleaner.Suggested simplification
edited = slack_message.edited - edited_ts = None - if isinstance(edited, dict): - edited_ts = edited.get("ts") - elif edited is not None and hasattr(edited, "ts"): - edited_ts = edited.ts + edited_ts = ( + edited.get("ts") if isinstance(edited, dict) + else getattr(edited, "ts", None) + ) updated_at = _parse_slack_ts_string(edited_ts or ts) if edited_ts else created_at🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cppa_slack_tracker/services.py` around lines 348 - 354, Replace the verbose edited-timestamp extraction with a unified getattr/get pattern: for the variable edited (from slack_message.edited) retrieve edited_ts by first checking dict via edited.get("ts") else use getattr(edited, "ts", None) so edited_ts becomes the ts value or None, then compute updated_at exactly as before with _parse_slack_ts_string(edited_ts or ts) if edited_ts else created_at; update the block that defines edited, edited_ts, and updated_at accordingly to use this simplified logic.github_activity_tracker/fetcher.py (2)
310-313: 💤 Low valueUnreachable
elsebranch creates type inconsistency.The API response from
client.rest_request()returns JSON, so eachcommentin the list will always be adict. Theelsebranch at line 312-313 appendscommentas-is (presumably already aGitHubComment), but this path is unreachable from the API. If this defensive code is intentional for future flexibility (e.g., pre-parsed inputs), the function signature should reflectlist[dict | GitHubComment]as input. Otherwise, remove the dead branch.Suggested simplification
- if isinstance(comment, dict): - results.append(parse_comment(comment)) - else: - results.append(comment) + results.append(parse_comment(comment))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@github_activity_tracker/fetcher.py` around lines 310 - 313, The else branch that appends comment as-is is unreachable because comments come from client.rest_request() as dicts; remove the unreachable else branch and always call parse_comment(comment) when building results (refer to parse_comment and the loop over comment), and update the function's type hints to accept list[dict] for input and return the appropriate parsed type; alternatively, if you intended to accept already-parsed GitHubComment objects, change the parameter/type annotation to list[dict | GitHubComment] and keep a runtime isinstance check—do one of these two consistent fixes so parse_comment, client.rest_request(), and GitHubComment usage line up.
367-370: 💤 Low valueSame unreachable
elsebranch pattern for reviews.Identical to the comment-parsing logic: the
elsebranch is dead code since API responses are always dicts.Suggested simplification
- if isinstance(review, dict): - results.append(parse_review(review)) - else: - results.append(review) + results.append(parse_review(review))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@github_activity_tracker/fetcher.py` around lines 367 - 370, The else branch after handling reviews is dead code because API returns dicts; in the block that checks "if isinstance(review, dict): results.append(parse_review(review)) else: results.append(review)" remove the unreachable else and always call results.append(parse_review(review)) (update where reviews are processed in fetcher.py and any helper that calls parse_review to reflect the simplified flow).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cppa_slack_tracker/services.py`:
- Around line 139-146: The current fallback logic lets topic.value overwrite a
non-empty description because the branch "elif isinstance(topic,
SlackTopicPurpose)" runs even when description was set from purpose; change the
topic-handling branch so it only assigns when description is empty — e.g.,
replace the "elif isinstance(topic, SlackTopicPurpose)" with a conditional that
also checks "not description" (so use "if not description and isinstance(topic,
SlackTopicPurpose)") or otherwise ensure both topic dict and SlackTopicPurpose
branches require description to be falsy before assigning; adjust the logic
around the variables purpose, topic, SlackTopicPurpose, and description
accordingly.
In `@github_activity_tracker/api_schemas.py`:
- Around line 59-60: The schemas currently allow required identifiers to be
optional; update GitHubIssue.number, GitHubPullRequest.number, and
GitHubCommit.sha in api_schemas.py to be required by removing the optional union
and any default of None (i.e., change from "int | None = None" or "str | None =
None" to a plain non-optional type with no default), so validation enforces
presence at the ingestion boundary; adjust the three occurrences referenced
(GitHubIssue.number, GitHubPullRequest.number, GitHubCommit.sha) accordingly and
run schema/validation tests.
- Around line 144-152: The validator's _normalize logic currently double-wraps
already-normalized payloads like {"issue": {...}} or {"pr": {...}}; update
_normalize to first check if data contains an "issue" or "pr" key whose value is
a dict and, if so, ensure that its "comments" field exists and is a list (coerce
to [] if missing or not a list) and return {"issue": issue_dict} or {"pr":
pr_dict} directly instead of nesting; apply the same check-and-coerce pattern to
the code paths handling issue_info/pr_info (the branches that reference
issue_info, pr_info and the alternate blocks around the lines noted) so both
normalized and raw payloads are handled consistently.
---
Outside diff comments:
In `@cppa_slack_tracker/sync/sync_user.py`:
- Around line 21-34: sync_users currently rejects non-dict user payloads and
later assumes dict methods like .get when logging, but _process_user_info now
supports object-style payloads; update sync_users (and the user-processing block
around is_bot extraction) to accept both dict and object payloads by using a
unified accessor: attempt attribute access (e.g., user_data.is_bot) and fall
back to mapping access (user_data.get("is_bot")), and avoid hard type checks
that return False for typed payloads; ensure the same pattern is applied to
other fields used later in sync_users and in the error logging paths so logs
don't call .get on objects and object-style users are not skipped as malformed
(refer to _process_user_info and the is_bot extraction in sync_users).
In `@discord_activity_tracker/sync/messages.py`:
- Around line 290-295: The loop inside the channel sync wrapper (the function
taking channel, guild_id, since_date, full_sync) currently drops any non-dict
payload with a bare continue, so typed/object-shaped messages from the
boundary-schema migration are never persisted; change the loop to detect
non-dict payloads and normalize them before persisting (for example, if payload
has a to_dict()/dict() method call that to serialize it, otherwise wrap it in a
stable envelope like {"_type":"object","value":payload} or convert to string)
instead of continuing, and remove the unconditional continue so all messages are
saved; update the code paths that later assume dict payloads to handle the new
envelope format.
---
Duplicate comments:
In `@cppa_slack_tracker/fetcher.py`:
- Around line 75-78: The code handling the Slack users.info response currently
returns None when data.get("user") is not a dict, which hides schema problems;
instead, replace the silent "return None" with a raised structured schema error
(e.g., raise SchemaValidationError(...) or ValueError if no schema error class
exists) that includes a clear message and the raw payload (the local variable
data) so callers can see the malformed response; update the function surrounding
this snippet (the code that calls parse_user and references user) to raise this
error rather than returning None and ensure any existing SchemaValidationError
type is imported or created if needed.
---
Nitpick comments:
In `@cppa_slack_tracker/services.py`:
- Around line 348-354: Replace the verbose edited-timestamp extraction with a
unified getattr/get pattern: for the variable edited (from slack_message.edited)
retrieve edited_ts by first checking dict via edited.get("ts") else use
getattr(edited, "ts", None) so edited_ts becomes the ts value or None, then
compute updated_at exactly as before with _parse_slack_ts_string(edited_ts or
ts) if edited_ts else created_at; update the block that defines edited,
edited_ts, and updated_at accordingly to use this simplified logic.
In `@docs/service_api/cppa_user_tracker.md`:
- Line 24: The docstring for get_or_create_slack_user uses a vague type Any for
user_data; change its documented type to match the Slack tracker pattern by
using Union[SlackUserPayload, dict[str, Any]] so callers get precise typing and
consistency with cppa_slack_tracker functions; update the signature entry for
get_or_create_slack_user to list user_data: Union[SlackUserPayload, dict[str,
Any]] and adjust any surrounding examples or parameter descriptions to reflect
the more specific type.
In `@github_activity_tracker/fetcher.py`:
- Around line 310-313: The else branch that appends comment as-is is unreachable
because comments come from client.rest_request() as dicts; remove the
unreachable else branch and always call parse_comment(comment) when building
results (refer to parse_comment and the loop over comment), and update the
function's type hints to accept list[dict] for input and return the appropriate
parsed type; alternatively, if you intended to accept already-parsed
GitHubComment objects, change the parameter/type annotation to list[dict |
GitHubComment] and keep a runtime isinstance check—do one of these two
consistent fixes so parse_comment, client.rest_request(), and GitHubComment
usage line up.
- Around line 367-370: The else branch after handling reviews is dead code
because API returns dicts; in the block that checks "if isinstance(review,
dict): results.append(parse_review(review)) else: results.append(review)" remove
the unreachable else and always call results.append(parse_review(review))
(update where reviews are processed in fetcher.py and any helper that calls
parse_review to reflect the simplified flow).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f3159672-857b-416e-9f8b-d4129b7810fb
📒 Files selected for processing (35)
CHANGELOG.mdcore/errors.pycore/tests/test_errors.pycppa_slack_tracker/api_schemas.pycppa_slack_tracker/fetcher.pycppa_slack_tracker/models.pycppa_slack_tracker/services.pycppa_slack_tracker/sync/sync_channel.pycppa_slack_tracker/sync/sync_message.pycppa_slack_tracker/sync/sync_user.pycppa_slack_tracker/tests/test_api_schemas.pycppa_slack_tracker/tests/test_fetcher_unit.pycppa_slack_tracker/tests/test_services.pycppa_user_tracker/services.pycppa_user_tracker/tests/test_services.pydiscord_activity_tracker/live_sync_schemas.pydiscord_activity_tracker/services.pydiscord_activity_tracker/sync/messages.pydiscord_activity_tracker/sync/utils.pydiscord_activity_tracker/tests/test_live_sync_schemas.pydiscord_activity_tracker/tests/test_sync_messages.pydiscord_activity_tracker/tests/test_sync_utils.pydocs/service_api/cppa_slack_tracker.mddocs/service_api/cppa_user_tracker.mddocs/service_api/discord_activity_tracker.mdgithub_activity_tracker/api_schemas.pygithub_activity_tracker/fetcher.pygithub_activity_tracker/sync/commits.pygithub_activity_tracker/sync/issues_and_prs.pygithub_activity_tracker/tests/test_api_schemas.pygithub_activity_tracker/tests/test_fetcher.pygithub_activity_tracker/tests/test_fetcher_commits_backward.pygithub_activity_tracker/tests/test_fetcher_issues_and_prs.pygithub_activity_tracker/tests/test_sync_commits.pypyrightconfig.json
Summary:
Service functions accepted raw dict for external API data (e.g. get_or_create_slack_team(team_data: dict)). Untyped dicts flowed from fetchers through services via .get() key access, producing KeyError at arbitrary call depth on malformed payloads. Pyright could not verify data flow from fetcher → service.
Changes:
GitHub — new github_activity_tracker/api_schemas.py (GitHubIssue, GitHubPullRequest, GitHubCommit, bundle wrappers, GitHubApiValidationError). fetcher.py validates at yield time; sync/issues_and_prs.py and sync/commits.py accept typed bundles; workspace JSON files are validated on load.
Slack — new cppa_slack_tracker/api_schemas.py (SlackTeamPayload, SlackChannelPayload, SlackMessagePayload, SlackUserPayload, SlackApiValidationError). fetcher.py returns typed payloads; all three service boundary functions updated; get_or_create_slack_user in cppa_user_tracker updated; sync callers follow.
Discord (live-sync path) — new discord_activity_tracker/live_sync_schemas.py (DiscordLiveUserPayload, DiscordLivePreparedMessage, DiscordReactionPayload, DiscordLiveSyncValidationError). sync/utils.py parse_discord_user now returns a typed model; sync/messages.py _prepare_message_data returns Optional[DiscordLivePreparedMessage]; bulk_upsert_discord_users/messages/reactions in services.py accept typed lists — eliminating the KeyError paths documented in their own docstrings. (Discord ChatExporter path was already validated via staging_schema.py; untouched.)
Cross-cutting — core/errors.py classify_failure now maps pydantic.ValidationError and all four app *ValidationError types to VALIDATION. cppa_slack_tracker added to pyrightconfig.json include.
Tests — test_api_schemas.py for GitHub and Slack; test_live_sync_schemas.py for Discord; test_sync_utils.py updated for typed return; test_errors.py extended; test_get_or_create_slack_team_requires_team_id asserts SlackApiValidationError.
Refs: Eval Test 6 / B1. Unblocks contract testing (issue 7 / B8).
Related Issues
Summary by CodeRabbit
New Features
Improvements
Tests