-
Notifications
You must be signed in to change notification settings - Fork 15
Matas/fix/fix actions #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughIntroduces public repository analysis with unauthenticated and authenticated access, per-IP/per-user rate limiting, a graph-based repository analysis agent backed by GitHub integrations and LLM-driven rule generation, widespread Ruff-centric tooling consolidation, and expanded API, models, and tests for the new analysis flow. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant API as API Endpoint
participant Auth as Auth Dependency
participant RateLimit as Rate Limiter
participant Agent as Repository Analysis Agent
participant GitHub as GitHub API
participant LLM as LLM Provider
participant Response as Response Builder
Client->>API: POST /api/v1/rules/recommend (repo_url, optional token)
API->>Auth: get_current_user_optional()
Auth-->>API: User | None
API->>RateLimit: rate_limiter(request, user)
alt Over limit
RateLimit-->>API: HTTP 429 + Retry-After
API-->>Client: 429 Rate Limited
else Under limit
RateLimit-->>API: Proceed
API->>Agent: execute(repo_full_name, ...)
Agent->>GitHub: fetch_repository_metadata()
GitHub-->>Agent: repo signals (files, languages, workflows)
Agent->>GitHub: fetch_pr_signals()
GitHub-->>Agent: PR data
Agent->>Agent: compute_hygiene_metrics()
Agent->>LLM: generate_rule_recommendations()
LLM-->>Agent: RuleRecommendation[]
Agent-->>API: AnalysisState (recommendations, metadata)
API->>Response: Build AnalysisResponse
Response-->>Client: 200 {recommendations, features, metadata}
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (26)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @MT-superdev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a significant overhaul of the repository analysis and rule recommendation system. The primary goal is to enhance the robustness, maintainability, and intelligence of the platform, particularly through the introduction of an 'AI Immune System' capable of detecting and mitigating low-quality or AI-generated contributions. This is achieved by refactoring the core agent to use a LangGraph-based state machine, improving GitHub API interactions, strengthening type safety, and implementing comprehensive error handling and structured logging. Additionally, the API now supports unauthenticated access for public repositories with appropriate rate limiting, making the service more accessible while maintaining stability. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (63.6%) is below the target coverage (80.0%). You can increase the patch coverage or adjust the target coverage. @@ Coverage Diff @@
## main #41 +/- ##
========================================
+ Coverage 32.8% 43.6% +10.7%
========================================
Files 85 101 +16
Lines 5162 6104 +942
========================================
+ Hits 1698 2666 +968
+ Misses 3464 3438 -26 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several significant improvements to the Watchflow project, including enhanced URL parsing, structured logging, specific error handling, improved type safety, new data models, GitHub integration enhancements, rate limiting, and execution timeouts. The code review focuses on identifying potential issues related to correctness and maintainability, particularly concerning error handling and adherence to best practices.
| # Linter (Fixes imports, modernizes syntax, checks bugs) | ||
| - id: ruff | ||
| name: "Linting and import sorting" | ||
| args: ["--fix"] | ||
|
|
||
| # Pyupgrade: Check and fix Python version incompatibilities and outdated syntax | ||
| - repo: https://github.com/asottile/pyupgrade | ||
| rev: v3.15.2 | ||
| hooks: | ||
| - id: pyupgrade | ||
| name: "Upgrade syntax for Python 3.12+" | ||
| args: [--py312-plus] | ||
| # Auto-fixes outdated syntax to Python 3.12+ compatible code | ||
| args: [--fix, --exit-non-zero-on-fix] | ||
| # Formatter (Replaces Black) | ||
| - id: ruff-format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check-ast hook has been removed. While Ruff generally covers AST checks, ensure that the removal of this specific hook doesn't inadvertently reduce the scope of syntax validation, potentially missing edge cases that check-ast might have caught. Consider adding a Ruff rule that specifically covers the checks previously done by check-ast.
| "ruff>=0.1.0", # Replaces black, isort, flake8 | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| except Exception as e: | ||
| # Catching Exception here is only for the top-level orchestration safety | ||
| logger.exception("agent_execution_failed", agent="repository_analysis", error=str(e)) | ||
| return AgentResult(success=False, message=str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from src.integrations.github.api import github_client | ||
| from src.integrations.providers.factory import get_chat_model | ||
|
|
||
| import logging | ||
| from typing import Any | ||
| logger = structlog.get_logger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| except openai.OpenAIError as e: | ||
| logger.error("rule_generation_failed", repo=repo_name, error=str(e), error_type="llm_provider_error") | ||
| fallback_reason = f"AI provider error: {e.__class__.__name__}" | ||
| except pydantic.ValidationError as e: | ||
| logger.error("rule_generation_failed", repo=repo_name, error=str(e), error_type="schema_mismatch") | ||
| fallback_reason = "AI model returned data in an unexpected format." | ||
| except Exception as e: | ||
| logger.error("rule_generation_failed", repo=repo_name, error=str(e), error_type="unknown_error", exc_info=True) | ||
| fallback_reason = f"An unexpected error occurred: {str(e)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The except openai.OpenAIError as e, except pydantic.ValidationError as e, and except Exception as e blocks are catching all exceptions. While this prevents the agent from crashing, it might mask specific, actionable exceptions. Consider logging the exception type and message for better debugging and potentially handling specific exception types differently.
| logger.warning(f"Invalid URL provided by {client_ip}: {repo_url_str}") | ||
| raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail=str(e)) from e | ||
|
|
||
| # Step 2: Rate limiting—TODO: use Redis. For now, agent handles GitHub 429s internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| logger.error(f"Error listing PRs for {repo}: {e}") | ||
| return [] | ||
|
|
||
| async def _get_session(self) -> aiohttp.ClientSession: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Fallback: ensure we have a valid session | ||
| self._session = aiohttp.ClientSession() | ||
| return self._session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/rules/validators.py (1)
622-630: Exception handling may fail ifpytzis not installed.The exception clause references
pytz.exceptions.UnknownTimeZoneError, but ifpytzis not installed, this will raise aNameErrorwhen Python evaluates the exception tuple—before catchingImportError.🐛 Suggested fix
try: import pytz tz = pytz.timezone(timezone_str) current_time = datetime.now(tz) - except (ImportError, pytz.exceptions.UnknownTimeZoneError): - # Fallback to UTC if pytz is not available or timezone is invalid - logger.warning(f"Invalid timezone '{timezone_str}', using UTC") - current_time = datetime.now() + except ImportError: + logger.warning(f"pytz not available, using local time for timezone '{timezone_str}'") + current_time = datetime.now() + except Exception as e: + # Catches UnknownTimeZoneError and other pytz errors + logger.warning(f"Invalid timezone '{timezone_str}': {e}, using local time") + current_time = datetime.now()Alternatively, move the import to the top of the file and handle gracefully:
try: import pytz PYTZ_AVAILABLE = True except ImportError: PYTZ_AVAILABLE = Falsesrc/api/rules.py (1)
14-27: Missing error handling for agent execution.The comment on line 19 notes that "agent may throw if rule malformed," but there's no try/except block. An unhandled exception would result in a 500 error with potentially sensitive stack trace information in the response.
Consider adding explicit error handling:
Proposed fix
`@router.post`("/rules/evaluate") async def evaluate_rule(request: RuleEvaluationRequest): # Agent: uses central config—change here affects all rule evals. agent = get_agent("feasibility") - # Async call—agent may throw if rule malformed. - result = await agent.execute(rule_description=request.rule_text) - - # Output: keep format stable for frontend. Brittle if agent changes keys. - return { - "supported": result.data.get("is_feasible", False), - "snippet": result.data.get("yaml_content", ""), - "feedback": result.message, - } + try: + result = await agent.execute(rule_description=request.rule_text) + return { + "supported": result.data.get("is_feasible", False), + "snippet": result.data.get("yaml_content", ""), + "feedback": result.message, + } + except Exception as e: + return { + "supported": False, + "snippet": "", + "feedback": f"Evaluation failed: {str(e)}", + }
🤖 Fix all issues with AI agents
In `@docs/benchmarks.md`:
- Line 9: The phrase "70 + enterprise policies" contains an extra space and
should be normalized to "70+ enterprise policies" to match the later usage;
update the string "Our analysis of 70 + enterprise policies..." to remove the
space between 70 and + so it reads "70+ enterprise policies" (leave surrounding
text and "85% of real-world governance policies require context" unchanged).
In `@src/agents/feasibility_agent/agent.py`:
- Line 2: Fix the double space in the module/class docstring: locate the
module-level docstring at the top of src/agents/feasibility_agent/agent.py (and
any docstring on the FeasibilityAgent class if present) that currently reads
"error handling and retry logic." and remove the extra space so it reads "error
handling and retry logic."
In `@src/agents/repository_analysis_agent/models.py`:
- Around line 6-17: Replace the manual string hacks in
parse_github_repo_identifier with giturlparse parsing: call
giturlparse.parse(identifier), and if the result is valid use
f"{parsed.owner}/{parsed.repo}" (or parsed.owner + "/" + parsed.name if repo is
empty) to normalize SSH and HTTPS forms; if giturlparse reports an invalid URL,
fall back to the existing strip/remove logic (remove protocol/domain, ".git",
and trailing slashes) or raise a clear ValueError to surface bad input. Ensure
you update parse_github_repo_identifier to import and use giturlparse so SSH
URLs like "[email protected]:owner/repo.git" and https URLs are handled
consistently.
In `@src/agents/repository_analysis_agent/nodes.py`:
- Around line 111-113: The calls to github_client.get_file_content and similar
methods in nodes.py are hardcoding installation_id=None which prevents access to
private repos; change those calls (e.g., the get_file_content invocation and the
other occurrences around the blocks you flagged) to pass the actual
installation_id available in the current scope (or a provided parameter like
installation_id) instead of None so authenticated requests are used for private
repos; update every instance you noted (lines ~111, ~126, ~145, ~195 ranges) to
forward the real installation_id variable to github_client methods.
In `@src/api/dependencies.py`:
- Around line 34-41: The bearer parsing currently accepts any token and returns
a fixed User; update the logic around auth_header / scheme / token (the block
that does scheme, token = auth_header.split() and returns User(...,
github_token=token)) so that you do not grant a fixed authenticated identity for
arbitrary strings: either validate the token with the real IdP or GitHub (verify
JWT signature or call the appropriate token-introspect/verify endpoint) and
populate User with the real identity on success, or treat unverified tokens as
anonymous and return None/anonymous user instead of User(id=123,...). Replace
the TODO stub with a call to a verification helper (e.g., verify_idp_token or
verify_github_token) and only construct/return User when that verification
passes.
In `@src/api/rate_limit.py`:
- Around line 21-28: The rate_limiter dependency always receives user=None
because it uses Depends(lambda: None); replace that with
Depends(get_current_user_optional) (and add an import for
get_current_user_optional if missing) so the rate_limiter function can detect
authenticated users by inspecting the user argument and correctly switch between
AUTH_LIMIT and ANON_LIMIT (refer to the rate_limiter function and
get_current_user_optional dependency).
In `@src/api/recommendations.py`:
- Around line 88-99: Replace raw PII in logs by hashing or redacting the user
identifier and client IP before logging: compute a one-way hash (e.g., sha256)
of client_ip and user.email (fallback values like "unknown"/"Anonymous" can be
hashed too) and use those hashed values in the logger.info and logger.warning
messages around repo_url_str and parse_repo_from_url; ensure the new variables
(hashed_user_id, hashed_client_ip) are used in place of user_id/client_ip so no
raw email or IP is written to logs.
- Line 5: The model currently imports and uses pydantic.HttpUrl which blocks
SSH/Git URLs; change the affected model fields from HttpUrl to plain str (remove
HttpUrl usage and its import) so SSH/git@... URLs are accepted, and rely on the
existing parse_repo_from_url function to validate/parse those strings (update
any model or field declarations that reference HttpUrl—e.g., the request model
fields at the top of src/api/recommendations.py and the fields around lines
27-29—to be str and keep Field(...) metadata unchanged).
- Around line 46-62: The host validation in parse_repo_from_url is too
permissive and allows lookalike domains; update the check to first ensure p.host
is not None and then perform a case-insensitive exact equality comparison
against "github.com" (e.g., compare p.host.lower() == "github.com") instead of
using `"github.com" in p.host`, leaving the rest of the ValueError behavior
intact; reference parse_repo_from_url and the p.host field when making this
change.
In `@src/core/models.py`:
- Around line 4-18: Change the User.github_token field to use pydantic's
SecretStr and mark it excluded from serialization; specifically import SecretStr
and update the github_token declaration in class User to use type SecretStr with
Field(default=None, description="OAuth token for GitHub API access",
exclude=True, repr=False) so the secret is stored securely and omitted from
model_dump()/serialization and reprs.
In `@src/integrations/github/api.py`:
- Around line 950-953: Inside the function that performs HTTP requests in
src/integrations/github/api.py remove the local "import httpx" and the except
block catching httpx.HTTPStatusError (this is dead because the code uses
aiohttp); instead catch aiohttp.ClientResponseError (or aiohttp.ClientError for
broader coverage) in the same except block and log the caught exception via the
existing logger (structlog.get_logger()) so the error details are recorded;
locate the try/except around the aiohttp request and replace the httpx-specific
handler with the aiohttp exception type and remove the unused import.
In `@src/integrations/github/service.py`:
- Around line 66-74: The current loop treating any non-200 from the contents API
as "file missing" hides repo-level errors (auth, rate limits, missing repo); add
a preflight repo validation before iterating files: call the repo endpoint
(e.g., f"{self.BASE_URL}/repos/{owner}/{repo}") or an authenticated HEAD on the
repo to check for 200 and explicitly handle 401/403/404/rate-limit responses
(raise or return an error) rather than appending to found_files; keep the
existing per-file checks only after the preflight succeeds and reuse headers and
the AsyncClient used around files_to_check to avoid duplicate clients.
In `@src/main.py`:
- Around line 88-96: The current CORSMiddleware setup (app.add_middleware with
CORSMiddleware and allow_origins=["*"], allow_credentials=True) violates the
CORS spec; either replace the wildcard origins with an explicit list of allowed
origins (set allow_origins to the exact host(s) you expect) when keepng
allow_credentials=True, or if you must keep wildcard origins set
allow_credentials=False; update the CORSMiddleware config accordingly in the
app.add_middleware call to use explicit origins or disable credentials so
browsers will accept the responses.
In `@tests/conftest.py`:
- Around line 5-48: The hard-coded base64 literal in the mock_settings fixture
(PRIVATE_KEY_BASE64_GITHUB) triggers secret scanners; change it to compute the
value at runtime instead (e.g., base64-encode a non-secret byte string) so tests
still get a valid value without a literal. Update the mock_settings fixture to
import/use Python's base64 (or encode b"mock-key" and decode to str) and set
PRIVATE_KEY_BASE64_GITHUB to that computed string rather than the literal
"bW9jay1rZXk=".
🧹 Nitpick comments (11)
src/core/utils/logging.py (1)
148-149: Comment is slightly verbose but harmless.The comment on line 148 restates what the TYPE_CHECKING block already conveys. Consider simplifying or removing it, but this is purely cosmetic.
Optional: simplify the comment
- # Callable is now only needed for typing, so it's safe to use the string name or handled by __future__ + # Type annotation only; Callable imported under TYPE_CHECKING log_fn: Callable[..., Any] = getattr(logger_obj, level, logger_obj.info)src/core/utils/caching.py (1)
161-163: Minor edge case withttl=0.The ternary refactor is clean. Note that
ttl=0would be falsy, causingTTLCachewithttl=3600to be created instead ofAsyncCachewith zero TTL. This is unlikely to be a real issue since a zero-TTL cache has no practical use, but consider usingttl is not Noneif you want to allow explicit zero values in the future.- cache = AsyncCache(maxsize=maxsize, ttl=ttl) if ttl else TTLCache(maxsize=maxsize, ttl=ttl or 3600) + cache = AsyncCache(maxsize=maxsize, ttl=ttl) if ttl is not None else TTLCache(maxsize=maxsize, ttl=3600)src/rules/validators.py (3)
201-206: Consider using the existing_compile_globfunction.There are two glob-to-regex conversion implementations in this file:
_compile_glob(line 13) - handles**for multi-directory matching and uses caching_glob_to_regex(line 202) - simpler, doesn't distinguish*from**The top-level
_compile_globis more robust and already used by_matches_any. Consider consolidating to avoid inconsistent behavior.♻️ Suggested refactor
`@staticmethod` def _glob_to_regex(glob_pattern: str) -> str: """Converts a glob pattern to a regex pattern.""" - # Simple glob→regex—fragile, production needs better. - regex = glob_pattern.replace(".", "\\.").replace("*", ".*").replace("?", ".") - return f"^{regex}$" + # Reuse the module-level implementation for consistency. + return _compile_glob(glob_pattern).pattern
229-243: Consider consolidatingApprovalCountConditionwithMinApprovalsCondition.
ApprovalCountCondition(always returnsTrue) duplicates the intent ofMinApprovalsCondition(lines 279-302) which has a working implementation. Having both in the registry under different names (has_min_approvalsandmin_approvals) could cause confusion.Consider either:
- Removing
ApprovalCountConditionand aliasinghas_min_approvalstoMinApprovalsCondition- Delegating
ApprovalCountCondition.validatetoMinApprovalsCondition
189-199:_get_changed_filesalways returns an empty list.Both the
pull_requestandpushbranches return empty lists, makingFilePatternConditioneffectively non-functional—it will always returnFalse(line 171-173). The TODOs acknowledge this, but consider adding a more prominent warning or raisingNotImplementedErrorto make the placeholder status explicit during testing.♻️ Optional: Make placeholder status explicit
def _get_changed_files(self, event: dict[str, Any]) -> list[str]: """Extracts the list of changed files from the event.""" event_type = event.get("event_type", "") if event_type == "pull_request": # TODO: Pull request—fetch changed files via GitHub API. Placeholder for now. + logger.warning("FilePatternCondition: Changed files extraction not implemented for pull_request") return [] elif event_type == "push": # Push event—files in commits, not implemented. + logger.warning("FilePatternCondition: Changed files extraction not implemented for push") return [] else: return []src/tasks/task_queue.py (1)
47-112: Deduplication infrastructure is unused.The
_create_event_hashmethod (lines 47-112) and theevent_hashesdict are fully implemented, butevent_hashis always set toNoneat line 126 with the comment "No deduplication for now". This means duplicate webhook events will be processed multiple times.If deduplication is not intended for this release, consider either:
- Adding a TODO comment explaining when this will be enabled, or
- Removing the unused code to reduce maintenance burden.
Would you like me to open an issue to track enabling the deduplication feature?
Also applies to: 126-126
.pre-commit-config.yaml (1)
17-24: Ruff version is significantly outdated.The Ruff version
v0.3.0is from early 2024. The latest release isv0.14.13(January 2026). Consider updating to a recent version for improved linting coverage, bug fixes, and new rules.src/integrations/github/api.py (1)
243-250: Session lifecycle pattern doesn't align with aiohttp best practices.Accessing
self._session._loop(a private attribute) is fragile, but the deeper issue is that maintaining a long-lived session and trying to detect event loop changes contradicts aiohttp's design. According to aiohttp documentation, sessions should be created and managed within their intended event loop context—typically usingasync with aiohttp.ClientSession()or ensuring the session lifecycle is tied to a proper async startup/shutdown hook. The suggested alternative usingconnector.closedstill relies on implementation details.Instead of detecting loop changes, align with aiohttp's recommendation: manage the session's lifecycle relative to the running event loop, or accept it as a parameter from the caller.
src/integrations/github/service.py (1)
107-115: Unify GitHub URL parsing to handle SSH/.git consistently.
split("/")fails for SSH URLs and preserves.gitsuffixes, which can generate invalid API paths if those forms ever reach this layer. Consider reusing the same robust parsing strategy used elsewhere (e.g., giturlparse) or move it into a shared utility.♻️ Suggested refactor (giturlparse)
-import logging -from typing import Any +import logging +from typing import Any + +from giturlparse import parse as parse_git_url ... def _parse_url(self, url: str) -> tuple[str, str]: """ Extracts owner and repo from https://github.com/owner/repo """ - clean_url = str(url).rstrip("/") - parts = clean_url.split("/") - if len(parts) < 2: - raise ValueError("Invalid GitHub URL") - return parts[-2], parts[-1] + p = parse_git_url(str(url)) + if not p.valid or not p.owner or not p.repo or "github.com" not in (p.host or ""): + raise ValueError("Invalid GitHub URL") + return p.owner, p.reposrc/api/dependencies.py (1)
8-9: Remove duplicate logger initialization.This is harmless but redundant.
♻️ Suggested cleanup
-logger = logging.getLogger(__name__) # Logger: keep at module level for reuse. -logger = logging.getLogger(__name__) +logger = logging.getLogger(__name__) # Logger: keep at module level for reuse.src/main.py (1)
121-125: Snapshot tasks list to keep counts consistent.Multiple passes over a live view can drift if tasks mutate mid-request; take a snapshot once.
♻️ Suggested refactor
- tasks = task_queue.tasks.values() - pending_count = sum(1 for t in tasks if t.status.value == "pending") - running_count = sum(1 for t in tasks if t.status.value == "running") - completed_count = sum(1 for t in tasks if t.status.value == "completed") - failed_count = sum(1 for t in tasks if t.status.value == "failed") + tasks = list(task_queue.tasks.values()) + pending_count = sum(1 for t in tasks if t.status.value == "pending") + running_count = sum(1 for t in tasks if t.status.value == "running") + completed_count = sum(1 for t in tasks if t.status.value == "completed") + failed_count = sum(1 for t in tasks if t.status.value == "failed") @@ - "total": len(tasks), + "total": len(tasks),Also applies to: 135-135
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (45)
.github/workflows/docs.yaml.pre-commit-config.yamlREADME.mddocs/benchmarks.mddocs/features.mdlinting_commands.txtpyproject.tomlsrc/__init__.pysrc/agents/base.pysrc/agents/feasibility_agent/agent.pysrc/agents/repository_analysis_agent/agent.pysrc/agents/repository_analysis_agent/models.pysrc/agents/repository_analysis_agent/nodes.pysrc/agents/repository_analysis_agent/prompts.pysrc/agents/repository_analysis_agent/test_agent.pysrc/api/dependencies.pysrc/api/rate_limit.pysrc/api/recommendations.pysrc/api/rules.pysrc/core/config/settings.pysrc/core/models.pysrc/core/utils/caching.pysrc/core/utils/logging.pysrc/integrations/github/api.pysrc/integrations/github/service.pysrc/integrations/providers/bedrock_provider.pysrc/integrations/providers/factory.pysrc/main.pysrc/rules/models.pysrc/rules/utils/contributors.pysrc/rules/validators.pysrc/tasks/scheduler/deployment_scheduler.pysrc/tasks/task_queue.pysrc/webhooks/auth.pysrc/webhooks/dispatcher.pysrc/webhooks/handlers/deployment.pysrc/webhooks/handlers/deployment_status.pysrc/webhooks/handlers/issue_comment.pysrc/webhooks/handlers/pull_request.pysrc/webhooks/router.pytests/conftest.pytests/integration/test_recommendations.pytests/integration/test_rules_api.pytests/unit/agents/test_repository_analysis_models.pytests/unit/api/test_recommendations.py
💤 Files with no reviewable changes (1)
- src/agents/repository_analysis_agent/test_agent.py
🧰 Additional context used
🧬 Code graph analysis (16)
tests/unit/agents/test_repository_analysis_models.py (1)
src/agents/repository_analysis_agent/models.py (4)
HygieneMetrics(119-141)PRSignal(94-116)RepositoryAnalysisRequest(20-37)parse_github_repo_identifier(6-17)
src/webhooks/handlers/deployment.py (1)
src/core/models.py (1)
repo_full_name(56-58)
tests/integration/test_recommendations.py (1)
src/core/utils/caching.py (1)
get(44-66)
src/agents/repository_analysis_agent/agent.py (3)
src/agents/base.py (5)
AgentResult(16-29)BaseAgent(32-119)_build_graph(55-60)execute(114-119)_execute_with_timeout(93-111)src/agents/repository_analysis_agent/nodes.py (3)
fetch_repository_metadata(58-175)fetch_pr_signals(178-258)generate_rule_recommendations(261-336)src/core/models.py (1)
repo_full_name(56-58)
tests/unit/api/test_recommendations.py (1)
src/api/recommendations.py (1)
parse_repo_from_url(46-62)
src/agents/base.py (4)
src/agents/repository_analysis_agent/agent.py (2)
_build_graph(22-40)execute(42-76)src/agents/feasibility_agent/agent.py (2)
_build_graph(34-55)execute(57-123)src/agents/acknowledgment_agent/agent.py (2)
_build_graph(36-52)execute(212-217)src/agents/engine_agent/agent.py (2)
_build_graph(54-73)execute(75-173)
src/api/rate_limit.py (2)
src/core/models.py (1)
User(7-18)tests/integration/test_rules_api.py (1)
client(21-23)
src/api/dependencies.py (3)
src/core/models.py (1)
User(7-18)src/integrations/github/service.py (1)
GitHubService(19-115)src/core/utils/caching.py (1)
get(44-66)
src/integrations/providers/factory.py (2)
src/core/config/provider_config.py (2)
get_max_tokens_for_agent(57-63)get_temperature_for_agent(65-71)tests/unit/test_feasibility_agent.py (1)
agent(18-31)
src/main.py (3)
src/webhooks/handlers/deployment_protection_rule.py (1)
DeploymentProtectionRuleEventHandler(10-33)src/tasks/scheduler/deployment_scheduler.py (2)
get_deployment_scheduler(364-369)stop(40-48)src/tasks/task_queue.py (1)
stop_workers(143-149)
src/tasks/task_queue.py (3)
src/webhooks/handlers/issue_comment.py (1)
event_type(19-20)src/core/models.py (1)
repo_full_name(56-58)src/core/utils/caching.py (1)
get(44-66)
src/webhooks/router.py (2)
src/webhooks/handlers/issue_comment.py (1)
event_type(19-20)src/core/models.py (1)
EventType(24-38)
src/core/utils/caching.py (4)
src/core/utils/logging.py (1)
decorator(86-128)src/core/utils/metrics.py (1)
decorator(83-143)src/core/utils/retry.py (2)
decorator(45-76)wrapper(47-74)src/core/utils/timeout.py (2)
decorator(65-73)wrapper(66-71)
src/agents/repository_analysis_agent/models.py (1)
src/core/models.py (1)
repo_full_name(56-58)
src/webhooks/dispatcher.py (9)
src/webhooks/handlers/deployment.py (1)
handle(17-46)src/webhooks/handlers/deployment_status.py (1)
handle(17-50)src/webhooks/handlers/issue_comment.py (2)
handle(25-147)event_type(19-20)src/webhooks/handlers/pull_request.py (1)
handle(18-45)src/webhooks/handlers/base.py (1)
handle(16-26)src/webhooks/handlers/deployment_protection_rule.py (1)
handle(16-33)src/webhooks/handlers/deployment_review.py (1)
handle(16-33)src/webhooks/handlers/push.py (1)
handle(16-29)src/webhooks/handlers/check_run.py (1)
handle(16-29)
src/integrations/github/api.py (2)
src/core/models.py (1)
repo_full_name(56-58)src/core/utils/caching.py (1)
get(44-66)
🪛 Gitleaks (8.30.0)
src/integrations/github/api.py
[high] 104-104: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 120-120: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
tests/conftest.py
[high] 42-42: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (63)
src/core/utils/logging.py (1)
8-18: LGTM! Clean typing refactor using TYPE_CHECKING pattern.The combination of
from __future__ import annotationswith theTYPE_CHECKINGguard forCallableis the correct approach. With PEP 563, annotations are stored as strings and not evaluated at runtime, soCallableis safely available only for static type checkers without any runtime import overhead.src/rules/utils/contributors.py (1)
61-65: LGTM! Clean refactor combining nested conditionals.The short-circuit evaluation of
andensures_has_recent_activity()is only called whencontributions >= min_contributions, preserving the original behavior while improving readability.Minor nit: The
# SIM102: Combined nested if statementscomment could be removed—linter rule references are better suited for commit messages than permanent code comments.src/core/utils/caching.py (2)
169-170: LGTM!Clean ternary simplification for cache key generation. The logic is preserved correctly.
173-175: LGTM!Good unification of the
.get()call. BothAsyncCacheandcachetools.TTLCachehave compatible.get()interfaces that returnNonefor missing/expired keys.src/rules/validators.py (4)
85-91: LGTM!The clarifying comment about validator metadata is helpful for maintainability.
138-147: LGTM!The TODO comment appropriately flags the placeholder implementation, and the code correctly returns
Falsefor unknown teams by defaulting to an empty list.
223-226: LGTM!The TODO comment appropriately flags the placeholder implementation with hardcoded contributors.
246-260: LGTM with a note on timezone handling.The comment correctly flags that
datetime.now()uses local time, which may behave inconsistently across deployments. For consistency withAllowedHoursCondition(which accepts atimezoneparameter), consider adding timezone support here in a future iteration.src/tasks/scheduler/deployment_scheduler.py (1)
2-47: LGTM: cancellation handling is cleaner withcontextlib.suppress.src/__init__.py (1)
1-1: LGTM: comment clarifies intent.tests/conftest.py (2)
11-29: LGTM: test helpers and src path setup.
54-61: LGTM: dedicated session event loop fixture..github/workflows/docs.yaml (1)
71-71: LGTM: deploy gated to main repo.src/webhooks/auth.py (1)
32-45: LGTM: clearer signature verification notes + explicit success return..pre-commit-config.yaml (1)
14-14: Good addition ofcheck-added-large-fileshook.This hook helps prevent accidentally committing large files to the repository, which is a good safeguard for repository hygiene.
src/agents/base.py (2)
54-60: LGTM - pragmatic typing for polymorphic return.The
Anyreturn type is appropriate here since subclasses return different compiled graph types (e.g.,StateGraphinacknowledgment_agent,Anyinrepository_analysis_agent). The docstring clearly documents the rationale.
113-119: LGTM - explicit kwargs typing for MyPy strict mode.The
**kwargs: Anytyping satisfies MyPy strict mode requirements while maintaining flexibility for subclass implementations with varying parameter needs.src/integrations/providers/factory.py (2)
8-9: Good addition of future annotations.The
from __future__ import annotationsimport enables PEP 563 postponed annotation evaluation, which is beneficial for forward references and aligns with modern Python practices.
78-79: LGTM - cleaner ternary expressions.The refactored ternary expressions are more concise while maintaining the same precedence logic: explicit params > agent config > global config.
src/tasks/task_queue.py (1)
35-35: LGTM - improved documentation clarity.The updated comments better explain the purpose of deduplication fields and event-specific hash components.
Also applies to: 43-43, 49-49, 57-57, 65-68, 89-89
src/rules/models.py (1)
20-28: LGTM - well-structured category enum.The
RuleCategoryenum is well-documented with clear inline comments explaining each category's purpose. TheHYGIENEcategory aligns with the PR's "AI Immune System" features for AI spam detection and contribution governance.src/integrations/providers/bedrock_provider.py (2)
151-167: LGTM: inference-profile matching is clearer and case-insensitive.
The explicit provider checks read well and reduce ambiguity.
178-181: LGTM.
Localized import is fine; just ensure the optional dependency is present when this path is used.src/core/config/settings.py (2)
114-118: LGTM: centralized analysis feature settings.
Defaults look consistent with the intended limits/timeouts.
142-156: LGTM: provider validation is clearer.
Unified Bedrock/Vertex checks should improve config guardrails.pyproject.toml (6)
28-29: Dependency additions look aligned with feature changes.
Please verify the minimum versions are sufficient for your usage.
33-46: LGTM: dev/docs dependency updates.
Verify the tool versions align with your CI expectations.
51-85: Ruff configuration consolidation looks solid.
Please confirm the configured options are supported by your minimum Ruff version.
91-94: LGTM: stricter mypy settings.
Ensure the team is ready for the stricter typing enforcement.
98-108: LGTM: pytest options remain consistent.
Verify these flags still match your CI coverage expectations.
117-118: LGTM: uv dev-dependencies update.
Please confirm the same min versions are used across your package managers.linting_commands.txt (1)
1-9: LGTM: concise lint/type-check workflow.
Verify the commands align with your pinned tool versions.src/core/models.py (2)
24-38: LGTM: str-backed EventType improves serialization ergonomics.
41-63: LGTM: helper docstrings are clearer.src/webhooks/handlers/issue_comment.py (3)
35-143: LGTM: clearer command routing and explicit ignore path.
155-169: LGTM: more flexible acknowledgment parsing.
193-194: LGTM:any()simplifies help detection.src/webhooks/dispatcher.py (1)
16-17: LGTM - Comment improvements add useful context.The updated comments clarify the registry's purpose (hot-swap, test injection), dispatch behavior, and thread-safety caveats. No functional changes.
Also applies to: 52-53, 60-60
src/webhooks/handlers/deployment_status.py (1)
27-30: LGTM - Useful fragility annotations.The comments accurately document the coupling to GitHub's payload structure and potential queue failures. Good defensive documentation.
Also applies to: 36-42
src/webhooks/handlers/deployment.py (1)
27-29: LGTM - Consistent documentation pattern.Matches the fragility annotations in
deployment_status.py, maintaining consistency across handlers.Also applies to: 35-38
src/webhooks/router.py (1)
31-34: Good defensive handling for unknown events.Returning 202 for unsupported event types prevents GitHub from retrying and avoids alert noise. The pattern of raising
HTTPException(status_code=202)and catching it in the endpoint (lines 62-65) works but is unconventional—consider returning a sentinel or using a custom exception class for clearer intent.src/agents/repository_analysis_agent/models.py (2)
94-141: Excellent documentation on PRSignal and HygieneMetrics.The detailed docstrings explaining the "AI Immune System" feature, signal purposes, and how metrics trigger defensive rules provide valuable context for future maintainers.
144-171: Clean blackboard pattern for AnalysisState.Well-structured state model with clear separation of inputs, collected signals, outputs, and execution metadata. The
Field(default_factory=list)usage is correct for mutable defaults.src/integrations/github/api.py (3)
37-65: Clean priority-based authentication flow.The three-tier auth priority (user token → installation token → anonymous) is well-structured and the
allow_anonymousflag cleanly enables public repo analysis.Note: The static analysis warnings about "Generic API Key" on lines referencing
allow_anonymousare false positives—they're flagging the parameter name, not actual secrets.
1044-1060: Good heuristic for issue reference detection.The regex patterns cover the most common linking conventions (
#123,fixes#123, `closes `#456). For a heuristic used in hygiene analysis, this coverage is appropriate.
99-130: Clean public repo access methods.
get_repositoryandlist_directory_any_authproperly leverage the newallow_anonymous=Trueflow, enabling Phase 1 public analysis without requiring authentication.src/integrations/github/service.py (1)
32-49: Clean 404/rate-limit handling for metadata fetch.Nice to see explicit checks before
raise_for_status().README.md (1)
239-244: Docs update matches the new public/anonymous rate‑limit behavior.Clear and concise.
tests/unit/api/test_recommendations.py (1)
6-54: Solid coverage of URL parsing edge cases.These cases align well with the updated parser behavior.
docs/features.md (1)
282-287: Documentation aligns with new rate‑limit semantics.Looks good.
tests/integration/test_rules_api.py (1)
28-28: Cleaner env flag check.The direct inequality reads clearer while keeping behavior the same.
Also applies to: 71-71, 106-106
tests/unit/agents/test_repository_analysis_models.py (1)
1-43: Good coverage for new model types and normalization.The added tests exercise URL normalization and the new hygiene/PR signal models well.
src/main.py (3)
16-18: Deployment protection rule handler is fully wired.Import and dispatcher registration are consistent with existing handlers.
Also applies to: 54-63
38-38: Startup/shutdown logging is consistent and concise.Also applies to: 65-65, 70-70, 78-78
103-103: Recommendations router now aligned with /api/v1.tests/integration/test_recommendations.py (1)
1-145: Great end-to-end coverage for public/private/auth cases.src/agents/repository_analysis_agent/prompts.py (1)
1-66: Prompt constraints are explicit and well-scoped.src/agents/repository_analysis_agent/nodes.py (2)
261-336: Structured output handling + fallback path look solid.
17-54: The PR payload mapping is correct.fetch_recent_pull_requestsexplicitly populates all required fields:number,has_issue_ref,author_association, andlines_changed. The GraphQL/REST response properly computeslines_changed(additions + deletions) andhas_issue_refvia_detect_issue_references, with sensible defaults for missing fields.src/agents/repository_analysis_agent/agent.py (2)
22-40: Graph wiring is clear and linear.Node registration and edges align with the intended flow.
42-76: Timeout + error handling look solid.The hard timeout and final state normalization keep execution robust.
src/api/recommendations.py (2)
33-40: Response schema is straightforward.Clear, stable shape for clients.
149-177: Stub endpoint validation is clear.Required-field checks make the stub predictable for Phase 1.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| ### Context Dependency in Enterprise Policies | ||
|
|
||
| Our analysis of 70+ enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules. | ||
| Our analysis of 70 + enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix spacing around “70+”.
The added space reads like arithmetic and is inconsistent with later “70+ enterprise policies”.
✏️ Proposed fix
-Our analysis of 70 + enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
+Our analysis of 70+ enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Our analysis of 70 + enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules. | |
| Our analysis of 70+ enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules. |
🤖 Prompt for AI Agents
In `@docs/benchmarks.md` at line 9, The phrase "70 + enterprise policies" contains
an extra space and should be normalized to "70+ enterprise policies" to match
the later usage; update the string "Our analysis of 70 + enterprise policies..."
to remove the space between 70 and + so it reads "70+ enterprise policies"
(leave surrounding text and "85% of real-world governance policies require
context" unchanged).
| @@ -1,5 +1,5 @@ | |||
| """ | |||
| Rule Feasibility Agent implementation with error handling and retry logic. | |||
| Rule Feasibility Agent implementation with error handling and retry logic. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix minor typo in docstring (double space).
There’s an extra space between “handling” and “and.”
✏️ Proposed fix
-Rule Feasibility Agent implementation with error handling and retry logic.
+Rule Feasibility Agent implementation with error handling and retry logic.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Rule Feasibility Agent implementation with error handling and retry logic. | |
| Rule Feasibility Agent implementation with error handling and retry logic. |
🤖 Prompt for AI Agents
In `@src/agents/feasibility_agent/agent.py` at line 2, Fix the double space in the
module/class docstring: locate the module-level docstring at the top of
src/agents/feasibility_agent/agent.py (and any docstring on the FeasibilityAgent
class if present) that currently reads "error handling and retry logic." and
remove the extra space so it reads "error handling and retry logic."
| def parse_github_repo_identifier(identifier: str) -> str: | ||
| """ | ||
| Normalize a GitHub repository identifier. | ||
|
|
||
| Accepts: | ||
| - owner/repo | ||
| - https://github.com/owner/repo | ||
| - https://github.com/owner/repo.git | ||
| - owner/repo/ | ||
| Normalizes various GitHub identifiers into 'owner/repo' format. | ||
| Used by tests to verify repository strings. | ||
| """ | ||
| raw = (value or "").strip() | ||
| if not raw: | ||
| return "" | ||
|
|
||
| if raw.startswith("https://") or raw.startswith("http://"): | ||
| parts = raw.split("/") | ||
| try: | ||
| gh_idx = parts.index("github.com") | ||
| except ValueError: | ||
| # Could be enterprise; keep as-is and let API validation fail. | ||
| return raw.rstrip("/").removesuffix(".git") | ||
|
|
||
| owner = parts[gh_idx + 1] if len(parts) > gh_idx + 1 else "" | ||
| repo = parts[gh_idx + 2] if len(parts) > gh_idx + 2 else "" | ||
| return f"{owner}/{repo}".rstrip("/").removesuffix(".git") | ||
| # Remove protocol and domain | ||
| clean_id = identifier.replace("https://github.com/", "").replace("http://github.com/", "") | ||
|
|
||
| return raw.rstrip("/").removesuffix(".git") | ||
| # Remove .git suffix and trailing slashes | ||
| clean_id = clean_id.replace(".git", "").strip("/") | ||
|
|
||
| return clean_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if giturlparse is used anywhere in the codebase
rg -n "giturlparse" --type pyRepository: warestack/watchflow
Length of output: 182
🏁 Script executed:
#!/bin/bash
# Check if giturlparse is in project dependencies
fd -e "requirements" -e "txt" -e "toml" -e "cfg" -e "ini" | xargs grep -l "giturlparse" 2>/dev/null || echo "giturlparse not found in dependency files"Repository: warestack/watchflow
Length of output: 77
🏁 Script executed:
#!/bin/bash
# Find and examine the parse_github_repo_identifier function usage
rg -n "parse_github_repo_identifier" --type py -A 2 -B 2Repository: warestack/watchflow
Length of output: 1850
🏁 Script executed:
#!/bin/bash
# Check for SSH URL patterns that might be handled
rg -n "git@github" --type pyRepository: warestack/watchflow
Length of output: 146
🏁 Script executed:
#!/bin/bash
# Check pyproject.toml for giturlparse dependency
cat pyproject.toml | grep -A 50 "dependencies" | head -60Repository: warestack/watchflow
Length of output: 1566
🏁 Script executed:
#!/bin/bash
# Check the test file to understand expected behavior
cat -n tests/unit/agents/test_repository_analysis_models.py | head -30Repository: warestack/watchflow
Length of output: 1296
🏁 Script executed:
#!/bin/bash
# Look at the full models.py file to see the complete context
cat -n src/agents/repository_analysis_agent/models.py | head -50Repository: warestack/watchflow
Length of output: 2507
Use giturlparse to handle SSH and HTTPS URLs consistently.
The function has access to giturlparse (a project dependency), but the implementation only handles HTTPS URLs. SSH URLs like [email protected]:owner/repo.git will not be parsed correctly and would be returned unparsed. Since giturlparse is already a dependency, it should be used to properly handle both URL formats and edge cases.
Suggested implementation
+from giturlparse import parse as parse_git_url
+
+
def parse_github_repo_identifier(identifier: str) -> str:
"""
Normalizes various GitHub identifiers into 'owner/repo' format.
Used by tests to verify repository strings.
"""
- # Remove protocol and domain
- clean_id = identifier.replace("https://github.com/", "").replace("http://github.com/", "")
-
- # Remove .git suffix and trailing slashes
- clean_id = clean_id.replace(".git", "").strip("/")
-
- return clean_id
+ parsed = parse_git_url(identifier)
+ if parsed.owner and parsed.repo:
+ return f"{parsed.owner}/{parsed.repo}"
+ # Fallback: assume already in owner/repo format
+ return identifier.replace(".git", "").strip("/")🤖 Prompt for AI Agents
In `@src/agents/repository_analysis_agent/models.py` around lines 6 - 17, Replace
the manual string hacks in parse_github_repo_identifier with giturlparse
parsing: call giturlparse.parse(identifier), and if the result is valid use
f"{parsed.owner}/{parsed.repo}" (or parsed.owner + "/" + parsed.name if repo is
empty) to normalize SSH and HTTPS forms; if giturlparse reports an invalid URL,
fall back to the existing strip/remove logic (remove protocol/domain, ".git",
and trailing slashes) or raise a clear ValueError to surface bad input. Ensure
you update parse_github_repo_identifier to import and use giturlparse so SSH
URLs like "[email protected]:owner/repo.git" and https URLs are handled
consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MT-superdev is this a leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| # Token extraction—fragile if header format changes. | ||
| scheme, token = auth_header.split() | ||
| if scheme.lower() != "bearer": | ||
| return None | ||
|
|
||
| # TODO: Wire to real IdP (Supabase/Auth0). For now, fake user if token present. WARNING: Must verify signature in prod. | ||
| return User(id=123, username="authenticated_user", email="[email protected]", github_token=token) | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bearer parsing currently grants “authenticated” status to any token.
Any string becomes a fixed User (same id/email), which (a) lets callers bypass anonymous rate limits and (b) collapses all “authenticated” users into one bucket. If the intent is real auth, please validate the token (IdP/GitHub) and populate a real identity; otherwise, treat unverified tokens as anonymous to avoid elevating privileges by default.
🤖 Prompt for AI Agents
In `@src/api/dependencies.py` around lines 34 - 41, The bearer parsing currently
accepts any token and returns a fixed User; update the logic around auth_header
/ scheme / token (the block that does scheme, token = auth_header.split() and
returns User(..., github_token=token)) so that you do not grant a fixed
authenticated identity for arbitrary strings: either validate the token with the
real IdP or GitHub (verify JWT signature or call the appropriate
token-introspect/verify endpoint) and populate User with the real identity on
success, or treat unverified tokens as anonymous and return None/anonymous user
instead of User(id=123,...). Replace the TODO stub with a call to a verification
helper (e.g., verify_idp_token or verify_github_token) and only construct/return
User when that verification passes.
| from pydantic import BaseModel, Field | ||
|
|
||
| class EventType(Enum): | ||
| """Supported GitHub event types.""" | ||
|
|
||
| class User(BaseModel): | ||
| """ | ||
| Represents an authenticated user in the system. | ||
| Used for dependency injection in API endpoints. | ||
| """ | ||
|
|
||
| id: int | ||
| username: str | ||
| email: str | None = None | ||
| avatar_url: str | None = None | ||
| # storing the token allows the service layer to make requests on behalf of the user | ||
| github_token: str | None = Field(None, description="OAuth token for GitHub API access") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the complete User model definition
cat -n src/core/models.pyRepository: warestack/watchflow
Length of output: 2374
🏁 Script executed:
# Search for model_dump() calls on User instances
rg "model_dump\(\)" --type py -B 2 -A 2Repository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Search for places where User model is serialized or logged
rg "User" --type py -B 1 -A 1 | grep -E "(model_dump|json|serialize|log)" | head -20Repository: warestack/watchflow
Length of output: 201
🏁 Script executed:
# Check if there are any logging patterns that might serialize the User model
rg "logger|print|log" src/core/models.py --type py -B 2 -A 2Repository: warestack/watchflow
Length of output: 225
🏁 Script executed:
# Search for token-related fields in the codebase to see if there's a pattern
rg "token" --type py | grep -i "field\|str" | head -20Repository: warestack/watchflow
Length of output: 2211
🏁 Script executed:
# Search for User model being used in endpoints or response models
rg "response_model.*User|User.*response_model" --type pyRepository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Search for endpoints that return User or use User in routes
rg "def.*\(.*User|-> User" --type py -B 2 -A 5Repository: warestack/watchflow
Length of output: 1881
🏁 Script executed:
# Check for FastAPI route definitions that might serialize User
rg "@.*\(|@router" --type py -A 5 | grep -E "(def|User|response_model)" | head -30Repository: warestack/watchflow
Length of output: 2627
🏁 Script executed:
# Look for any JSON serialization patterns with User
rg "json|serialize|dump" src/ --type py | grep -i userRepository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Check the main application file to see how User is used
find . -name "main.py" -o -name "app.py" -o -name "__main__.py" | head -5Repository: warestack/watchflow
Length of output: 76
🏁 Script executed:
# Search for dependency injection patterns where User is provided
rg "Depends\|get_current_user" --type py -B 2 -A 2Repository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Check the api/dependencies.py file to understand User creation
cat -n src/api/dependencies.pyRepository: warestack/watchflow
Length of output: 2193
🏁 Script executed:
# Search for all function definitions in api directory to see return types
rg "async def|def " src/api/ --type py -A 1 | grep -E "(def |-> )" | head -30Repository: warestack/watchflow
Length of output: 992
🏁 Script executed:
# Check if User is ever returned as response_model in any endpoint
rg "response_model|return.*User|User\(" src/api/ --type py -B 1 -A 1Repository: warestack/watchflow
Length of output: 569
🏁 Script executed:
# Search for any serialization or logging of User objects
rg "str\(user|user\.|\.model_dump|\.dict\(" src/api/ --type py -B 1 -A 1Repository: warestack/watchflow
Length of output: 466
🏁 Script executed:
# Check where github_token is actually used from the User object
rg "user\.github_token|github_token" src/api/ --type py -B 2 -A 2Repository: warestack/watchflow
Length of output: 503
🏁 Script executed:
# Check if User objects are ever returned in any endpoint responses
rg "return user|return.*User" src/api/ --type py -B 3 -A 1Repository: warestack/watchflow
Length of output: 728
🏁 Script executed:
# Verify if FastAPI response_model includes User anywhere
rg "response_model" src/api/ --type py -B 2 -A 2Repository: warestack/watchflow
Length of output: 357
🏁 Script executed:
# Check if there's any JSON response serialization of User
rg "JSONResponse|json\.dumps" src/api/ --type py -B 2 -A 2Repository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Check all endpoints to see their return types
find src/api -name "*.py" -exec grep -l "async def\|def " {} \; | xargs -I {} sh -c 'echo "=== {} ===" && grep -A 3 "async def\|def " {}'Repository: warestack/watchflow
Length of output: 2278
🏁 Script executed:
# Check if there's any logging that might include User objects
rg "logger\.|print\(" src/api/ --type py -B 1 -A 1 | grep -C 2 "user"Repository: warestack/watchflow
Length of output: 237
🏁 Script executed:
# Check the actual endpoint that uses user parameter to see what happens with it
cat -n src/api/recommendations.py | head -50Repository: warestack/watchflow
Length of output: 1834
🏁 Script executed:
# Search for any debugger or pdb usage that might print User objects
rg "pdb|breakpoint|print\(.*user" --type pyRepository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Check all files in the codebase to see if github_token is accessed from User objects
rg "\.github_token|user\.github_token" --type pyRepository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Check recommendations.py in detail to see the full endpoint implementation
cat -n src/api/recommendations.py | sed -n '50,100p'Repository: warestack/watchflow
Length of output: 2281
🏁 Script executed:
# Check if User object is ever passed to any logging or debugging functions
rg "logger\.(debug|info|warning|error)\(" src/api/ --type py -B 1 -A 1 | head -30Repository: warestack/watchflow
Length of output: 707
🏁 Script executed:
# Verify Pydantic version to confirm SecretStr and exclude behavior
rg "pydantic" pyproject.toml requirements.txt setup.py 2>/dev/null || echo "No version file found"Repository: warestack/watchflow
Length of output: 122
🏁 Script executed:
# Check if there are any other models that might handle secrets differently
rg "SecretStr|exclude=True" --type pyRepository: warestack/watchflow
Length of output: 45
Secure github_token using SecretStr with exclude=True.
While github_token is not currently serialized in API responses, it will be included in model_dump() by default if called. Use SecretStr and exclude it from serialization to follow Pydantic security best practices and prevent accidental exposure through future logging or debugging.
🛡️ Proposed fix
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, SecretStr
@@
- github_token: str | None = Field(None, description="OAuth token for GitHub API access")
+ github_token: SecretStr | None = Field(
+ None,
+ description="OAuth token for GitHub API access",
+ exclude=True,
+ repr=False,
+ )🤖 Prompt for AI Agents
In `@src/core/models.py` around lines 4 - 18, Change the User.github_token field
to use pydantic's SecretStr and mark it excluded from serialization;
specifically import SecretStr and update the github_token declaration in class
User to use type SecretStr with Field(default=None, description="OAuth token for
GitHub API access", exclude=True, repr=False) so the secret is stored securely
and omitted from model_dump()/serialization and reprs.
| import httpx | ||
| import structlog | ||
|
|
||
| logger = structlog.get_logger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code: httpx.HTTPStatusError handler will never trigger.
This method uses aiohttp for HTTP requests but catches httpx.HTTPStatusError. Since aiohttp raises aiohttp.ClientResponseError (not httpx exceptions), lines 1031-1039 are dead code.
Also, importing httpx inside the function (line 950) is unnecessary and misleading.
🐛 Proposed fix
- import httpx
import structlog
logger = structlog.get_logger()
# ... rest of method ...
- except httpx.HTTPStatusError as e:
- logger.error(
- "pr_fetch_http_error",
- repo=repo_full_name,
- status_code=e.response.status_code,
- error_type="network_error",
- error=str(e),
- )
- return []
except Exception as e:
logger.error("pr_fetch_unexpected_error", repo=repo_full_name, error_type="unknown_error", error=str(e))
return []Also applies to: 1031-1039
🤖 Prompt for AI Agents
In `@src/integrations/github/api.py` around lines 950 - 953, Inside the function
that performs HTTP requests in src/integrations/github/api.py remove the local
"import httpx" and the except block catching httpx.HTTPStatusError (this is dead
because the code uses aiohttp); instead catch aiohttp.ClientResponseError (or
aiohttp.ClientError for broader coverage) in the same except block and log the
caught exception via the existing logger (structlog.get_logger()) so the error
details are recorded; locate the try/except around the aiohttp request and
replace the httpx-specific handler with the aiohttp exception type and remove
the unused import.
src/integrations/github/service.py
Outdated
| async with httpx.AsyncClient() as client: | ||
| for filepath in files_to_check: | ||
| # Tricky: Public repos can be read without auth, Private need auth | ||
| # We use the 'contents' API | ||
| check_url = f"{self.BASE_URL}/repos/{owner}/{repo}/contents/{filepath}" | ||
| resp = await client.get(check_url, headers=headers) | ||
| if resp.status_code == 200: | ||
| found_files.append(filepath) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surface repo-level errors instead of silently treating them as missing files.
Right now any non‑200 (invalid token, rate limit, or missing repo) is interpreted as “file missing,” which can return misleading recommendations and hide real failures. Consider a preflight repo check (or explicit 401/403 handling) before the file loop so repo/rate-limit issues are surfaced.
🛠️ Suggested fix (preflight repo validation)
async with httpx.AsyncClient() as client:
+ repo_check_url = f"{self.BASE_URL}/repos/{owner}/{repo}"
+ repo_resp = await client.get(repo_check_url, headers=headers)
+ if repo_resp.status_code == 404:
+ raise GitHubResourceNotFoundError(f"Repo {owner}/{repo} not found")
+ if repo_resp.status_code == 403 and "rate limit" in repo_resp.text.lower():
+ raise GitHubRateLimitError("GitHub API rate limit exceeded")
+ repo_resp.raise_for_status()
+
for filepath in files_to_check:
# Tricky: Public repos can be read without auth, Private need auth
# We use the 'contents' API
check_url = f"{self.BASE_URL}/repos/{owner}/{repo}/contents/{filepath}"
resp = await client.get(check_url, headers=headers)
if resp.status_code == 200:
found_files.append(filepath)🤖 Prompt for AI Agents
In `@src/integrations/github/service.py` around lines 66 - 74, The current loop
treating any non-200 from the contents API as "file missing" hides repo-level
errors (auth, rate limits, missing repo); add a preflight repo validation before
iterating files: call the repo endpoint (e.g.,
f"{self.BASE_URL}/repos/{owner}/{repo}") or an authenticated HEAD on the repo to
check for 200 and explicitly handle 401/403/404/rate-limit responses (raise or
return an error) rather than appending to found_files; keep the existing
per-file checks only after the preflight succeeds and reuse headers and the
AsyncClient used around files_to_check to avoid duplicate clients.
| # We explicitly allow all origins ("*") to prevent the browser from blocking requests | ||
| # from your local file system or different localhost ports. | ||
|
|
||
| app.add_middleware( | ||
| CORSMiddleware, | ||
| allow_origins=config.cors.origins, | ||
| allow_origins=["*"], # Explicitly allow all for development | ||
| allow_credentials=True, | ||
| allow_methods=["*"], | ||
| allow_headers=config.cors.headers, | ||
| allow_headers=["*"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
CORS wildcard origins credentials allow_credentials browser specification
💡 Result:
-
You cannot combine
Access-Control-Allow-Origin: *with credentialed CORS requests. If the browser sends a CORS request with credentials (cookies, HTTP auth, TLS client certs—e.g.fetch(..., { credentials: "include" })/xhr.withCredentials = true), then the response must not use the wildcard*forAccess-Control-Allow-Origin; browsers will block JS access to the response and surface a CORS error. [1], [2] -
What the spec requires instead (credentialed CORS):
- Respond with one explicit origin:
Access-Control-Allow-Origin: <value of the request Origin header> - And include:
Access-Control-Allow-Credentials: true(the only valid value) [1], [3]
- Respond with one explicit origin:
-
Preflight note: CORS preflight (
OPTIONS) requests “must never include credentials”; the server indicates whether the actual request may include credentials by returningAccess-Control-Allow-Credentials: true. [4]
Sources:
[1] W3C CORS (2020 snapshot): wildcard * “cannot be used for a resource that supports credentials” and credentialed responses must echo Origin + Access-Control-Allow-Credentials: true.
[2] MDN CORS error: credentials not supported when Access-Control-Allow-Origin is *.
[3] MDN Access-Control-Allow-Credentials: credentials allowed only when server sends Access-Control-Allow-Credentials: true.
[4] MDN CORS guide: preflight requests must never include credentials; server must return Access-Control-Allow-Credentials: true to allow credentialed actual request.
🏁 Script executed:
# Check if there's more context in the repository about this CORS configuration
find . -type f -name "*.py" | head -20 | xargs grep -l "CORSMiddleware" 2>/dev/null | head -5Repository: warestack/watchflow
Length of output: 45
🏁 Script executed:
# Get the actual file context around the lines mentioned
if [ -f "src/main.py" ]; then
head -n 100 src/main.py | tail -n 30
fiRepository: warestack/watchflow
Length of output: 773
🏁 Script executed:
# Search for FastAPI/Starlette CORSMiddleware implementation or behavior with wildcard + credentials
rg -i "allow_credentials.*allow_origins|allow_origins.*allow_credentials" --type py -A 5 -B 5Repository: warestack/watchflow
Length of output: 45
🌐 Web query:
FastAPI Starlette CORSMiddleware allow_origins wildcard allow_credentials=True behavior
💡 Result:
In Starlette/FastAPI CORSMiddleware, setting:
allow_origins=["*"]
allow_credentials=Trueis effectively “self-contradictory” for browsers: credentialed CORS responses must not use Access-Control-Allow-Origin: *; they must echo a specific origin. Browsers will block credentialed responses if * is used. [3], [4]
What Starlette actually does (current behavior)
Starlette computes:
allow_all_origins = True(because*is inallow_origins)preflight_explicit_allow_origin = True(becauseallow_credentials=True) [2]
Preflight (OPTIONS with Origin + Access-Control-Request-Method):
- Starlette echoes the request origin into
Access-Control-Allow-Origin: <requested_origin>(since all origins are “allowed” when*is configured), and includesAccess-Control-Allow-Credentials: true. [2]
Non-preflight (“simple”) responses:
- Starlette initially sets
Access-Control-Allow-Origin: *andAccess-Control-Allow-Credentials: true. [2] - But if the request includes a
Cookieheader, Starlette overridesAccess-Control-Allow-Originto the explicit request origin and addsVary: Origin. [2]
Practical implication
- If your credentialed requests rely on cookies, Starlette will typically echo the origin (because it detects
Cookie). [2] - If your credentialed requests are “credentialed” in the browser sense but don’t include a
Cookieheader (e.g., you rely on other credential modes / auth behaviors), you can still end up withAccess-Control-Allow-Origin: *on the actual response, which browsers reject for credentialed CORS. [3], [4]
Recommended configuration
If you want credentials, don’t use *—list explicit origins (or use an allowlist regex), per Starlette’s own docs. [1], [4]
Sources: Starlette CORSMiddleware docs [1], FastAPI/Starlette CORSMiddleware source listing [2], MDN CORS credentialed-wildcard rules [4] and related error explanation [3].
Wildcard origins + credentials violates CORS spec and will break credentialed requests.
Browsers reject Access-Control-Allow-Origin: * when credentials are allowed (per CORS spec). Starlette's CORSMiddleware with this configuration behaves unpredictably: it echoes the origin for preflight and cookie-based requests but may still respond with * for non-cookie credentialed requests, which browsers will block. Use explicit origins for credentialed requests or disable credentials.
🔒 Suggested fix (choose one)
app.add_middleware(
CORSMiddleware,
- allow_origins=["*"], # Explicitly allow all for development
- allow_credentials=True,
+ allow_origins=["*"], # dev only; use explicit origins in prod
+ allow_credentials=False,
allow_methods=["*"],
allow_headers=["*"],
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # We explicitly allow all origins ("*") to prevent the browser from blocking requests | |
| # from your local file system or different localhost ports. | |
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=config.cors.origins, | |
| allow_origins=["*"], # Explicitly allow all for development | |
| allow_credentials=True, | |
| allow_methods=["*"], | |
| allow_headers=config.cors.headers, | |
| allow_headers=["*"], | |
| # We explicitly allow all origins ("*") to prevent the browser from blocking requests | |
| # from your local file system or different localhost ports. | |
| app.add_middleware( | |
| CORSMiddleware, | |
| allow_origins=["*"], # dev only; use explicit origins in prod | |
| allow_credentials=False, | |
| allow_methods=["*"], | |
| allow_headers=["*"], | |
| ) |
🤖 Prompt for AI Agents
In `@src/main.py` around lines 88 - 96, The current CORSMiddleware setup
(app.add_middleware with CORSMiddleware and allow_origins=["*"],
allow_credentials=True) violates the CORS spec; either replace the wildcard
origins with an explicit list of allowed origins (set allow_origins to the exact
host(s) you expect) when keepng allow_credentials=True, or if you must keep
wildcard origins set allow_credentials=False; update the CORSMiddleware config
accordingly in the app.add_middleware call to use explicit origins or disable
credentials so browsers will accept the responses.
| import os | ||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| import pytest | ||
|
|
||
| # 1. Ensure src is in path | ||
| ROOT = Path(__file__).resolve().parent.parent | ||
| SRC = ROOT / "src" | ||
|
|
||
| if str(SRC) not in sys.path: | ||
| sys.path.insert(0, str(SRC)) | ||
|
|
||
|
|
||
| # 2. Helper for environment mocking | ||
| class Helpers: | ||
| @staticmethod | ||
| def mock_env(env_vars): | ||
| from unittest.mock import patch | ||
|
|
||
| return patch.dict(os.environ, env_vars) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def helpers(): | ||
| return Helpers | ||
|
|
||
|
|
||
| # 3. Mock Environment Variables (Security First) | ||
| # We do this BEFORE importing app code to ensure no real secrets are read | ||
| @pytest.fixture(autouse=True) | ||
| def mock_settings(): | ||
| """Forces the test environment to use dummy values.""" | ||
| with Helpers.mock_env( | ||
| { | ||
| "APP_CLIENT_ID_GITHUB": "mock-client-id", | ||
| "APP_CLIENT_SECRET_GITHUB": "mock-client-secret", | ||
| "WEBHOOK_SECRET_GITHUB": "mock-webhook-secret", | ||
| "PRIVATE_KEY_BASE64_GITHUB": "bW9jay1rZXk=", # "mock-key" in base64 | ||
| "AI_PROVIDER": "openai", | ||
| "OPENAI_API_KEY": "sk-mock-key", | ||
| "ENVIRONMENT": "test", | ||
| } | ||
| ): | ||
| yield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid hard-coded base64 key to prevent secret-scanner false positives.
The literal on Line 42 is already flagged by Gitleaks; computing it at runtime avoids this while keeping validity.
🔧 Proposed fix
-import os
+import base64
+import os
@@
- "PRIVATE_KEY_BASE64_GITHUB": "bW9jay1rZXk=", # "mock-key" in base64
+ "PRIVATE_KEY_BASE64_GITHUB": base64.b64encode(b"mock-key").decode(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import os | |
| import sys | |
| from pathlib import Path | |
| import pytest | |
| # 1. Ensure src is in path | |
| ROOT = Path(__file__).resolve().parent.parent | |
| SRC = ROOT / "src" | |
| if str(SRC) not in sys.path: | |
| sys.path.insert(0, str(SRC)) | |
| # 2. Helper for environment mocking | |
| class Helpers: | |
| @staticmethod | |
| def mock_env(env_vars): | |
| from unittest.mock import patch | |
| return patch.dict(os.environ, env_vars) | |
| @pytest.fixture | |
| def helpers(): | |
| return Helpers | |
| # 3. Mock Environment Variables (Security First) | |
| # We do this BEFORE importing app code to ensure no real secrets are read | |
| @pytest.fixture(autouse=True) | |
| def mock_settings(): | |
| """Forces the test environment to use dummy values.""" | |
| with Helpers.mock_env( | |
| { | |
| "APP_CLIENT_ID_GITHUB": "mock-client-id", | |
| "APP_CLIENT_SECRET_GITHUB": "mock-client-secret", | |
| "WEBHOOK_SECRET_GITHUB": "mock-webhook-secret", | |
| "PRIVATE_KEY_BASE64_GITHUB": "bW9jay1rZXk=", # "mock-key" in base64 | |
| "AI_PROVIDER": "openai", | |
| "OPENAI_API_KEY": "sk-mock-key", | |
| "ENVIRONMENT": "test", | |
| } | |
| ): | |
| yield | |
| import base64 | |
| import os | |
| import sys | |
| from pathlib import Path | |
| import pytest | |
| # 1. Ensure src is in path | |
| ROOT = Path(__file__).resolve().parent.parent | |
| SRC = ROOT / "src" | |
| if str(SRC) not in sys.path: | |
| sys.path.insert(0, str(SRC)) | |
| # 2. Helper for environment mocking | |
| class Helpers: | |
| `@staticmethod` | |
| def mock_env(env_vars): | |
| from unittest.mock import patch | |
| return patch.dict(os.environ, env_vars) | |
| `@pytest.fixture` | |
| def helpers(): | |
| return Helpers | |
| # 3. Mock Environment Variables (Security First) | |
| # We do this BEFORE importing app code to ensure no real secrets are read | |
| `@pytest.fixture`(autouse=True) | |
| def mock_settings(): | |
| """Forces the test environment to use dummy values.""" | |
| with Helpers.mock_env( | |
| { | |
| "APP_CLIENT_ID_GITHUB": "mock-client-id", | |
| "APP_CLIENT_SECRET_GITHUB": "mock-client-secret", | |
| "WEBHOOK_SECRET_GITHUB": "mock-webhook-secret", | |
| "PRIVATE_KEY_BASE64_GITHUB": base64.b64encode(b"mock-key").decode(), | |
| "AI_PROVIDER": "openai", | |
| "OPENAI_API_KEY": "sk-mock-key", | |
| "ENVIRONMENT": "test", | |
| } | |
| ): | |
| yield |
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 42-42: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In `@tests/conftest.py` around lines 5 - 48, The hard-coded base64 literal in the
mock_settings fixture (PRIVATE_KEY_BASE64_GITHUB) triggers secret scanners;
change it to compute the value at runtime instead (e.g., base64-encode a
non-secret byte string) so tests still get a valid value without a literal.
Update the mock_settings fixture to import/use Python's base64 (or encode
b"mock-key" and decode to str) and set PRIVATE_KEY_BASE64_GITHUB to that
computed string rather than the literal "bW9jay1rZXk=".
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.