fix: apply allow/exclude_repositories filters to transcript session data#1623
fix: apply allow/exclude_repositories filters to transcript session data#1623Siddhant-K-code wants to merge 7 commits into
Conversation
Repository allow/exclude filters previously only gated checkpoints (src/commands/git_ai_handlers.rs); session (transcript) MetricEvents were emitted regardless of the configured repository lists. A customer expected these filters to also scope session data. Filter session events in the transcript streaming pipeline, mirroring the checkpoint-time filter: - Add `session_repo_allowed(config, work_dir)` and an early bail-out in `process_session_blocking`, placed after the work_dir is resolved and before any events are emitted or the watermark is advanced. On a skip we emit nothing and intentionally do NOT advance the watermark, so a backlog re-flows if the filters later change. Reads `Config::fresh()` so a running daemon observes config changes without a restart. - Reuse `Config::is_allowed_repository_with_remotes` so semantics match the checkpoint path exactly: exclusions take precedence, an empty allowlist allows everything, and an active allowlist fails closed when the repo can't be verified (no work_dir, not a git repo, or no remote). Shared streams (e.g. Copilot OTEL) never carry a repo URL, so they fall into this fail-closed path under an allowlist and pass under exclude-only filters. Test support: - Add allow_repositories/exclude_repositories to ConfigPatch and apply_test_config_patch, and wire them into the TestRepo home config writer. - Add Config::with_repository_filters_for_test. - Add 6 unit tests covering no-filters, allowlist match/miss, exclude precedence, and fail-closed/exclude-only unknown-repo behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Follow-up on Devin review Excluded sessions cause repeated expensive I/O on every sweep cycle Fixed in Added a regression test verifying the metadata update marks the file seen without advancing the watermark. Shared streams are unconditionally dropped under an active allowlist This behavior is intentional and security-correct: shared streams like Copilot OTEL do not carry enough repo context to verify against Also fixed two unrelated newer-Clippy blockers on this branch ( Verification run locally: ~/.cargo/bin/cargo fmt
~/.cargo/bin/cargo test metadata_update_marks_file_seen_without_advancing_watermark --lib -- --test-threads 1
~/.cargo/bin/cargo test session_repo_allowed --lib -- --test-threads 1
~/.cargo/bin/cargo clippy --all-targets -- -D warningsPushed to |
|
Final follow-up on the review comments and CI:
Validation: cargo check --tests
cargo fmt -- --check
cargo clippy --all-targets -- -D warnings
cargo test --lib session_repo_allowed -- --test-threads 12
cargo test --lib filter_skip -- --test-threads 12
GIT_AI_TEST_SHARED_DAEMON_POOL_SIZE=12 cargo test -- --test-threads 12CI is green for the visible PR checks, including Ubuntu, macOS, and Windows test jobs. |
Closes #1605
Problem
A customer expected their
allow_repositories/exclude_repositoriesfilters to also apply to session (transcript) data, but those filters only gated checkpoints (Config::is_allowed_repositoryinsrc/commands/git_ai_handlers.rs). Session/transcriptMetricEvents (SessionEvent,OtelTrace) were emitted and uploaded regardless of the configured repository lists.The transcript pipeline already resolves the session's repository (hook
repo_work_dir> DB > inferred cwd) in order to stamprepo_urlon each event — it just never consulted the filters. This is effectively a bug fix: the boilerplate already existed.Fix
In
src/daemon/stream_worker.rs:session_repo_allowed(config, work_dir)helper.process_session_blocking, placed right after the work_dir is resolved and before any events are emitted or the watermark is advanced. On a skip we emit nothing and intentionally do not advance the watermark, so a backlog re-flows if the filters later change.Config::fresh()so a long-lived daemon observes config changes without a restart.Config::is_allowed_repository_with_remotes, so semantics match the checkpoint path exactly: exclusions take precedence, an empty allowlist allows everything, and an active allowlist fails closed when the repo can't be verified (no work_dir, not a git repo, or no remote).A single filter covers both entry points (checkpoint-triggered notifications and the periodic sweep) since both flow through
process_session_blocking.Decisions
resolved_work_diris forcedNonefor them, and OTEL spans contain no path/cwd. They therefore fall into the unknown-repo path: dropped under an active allowlist, passed under exclude-only filters. Worth noting: settingallow_repositoriesdisables Copilot OTEL telemetry, which is the security-correct behavior.Tests
src/daemon/stream_worker_tests.rsexercising the realsession_repo_allowedagainst real git repos, realConfig, and real glob matching: no-filters fast path, allowlist match/miss, exclude precedence, fail-closed unknown-repo under allowlist, and exclude-only unknown-repo pass-through.allow_repositories/exclude_repositoriestoConfigPatch+apply_test_config_patch, wired into theTestRepohome-config writer (enables daemon-level config patching in future integration tests).Config::with_repository_filters_for_test.task build,task lint,task fmtall clean; new tests pass; config/streams suites show no regressions.How to verify
Automated (unit tests for the filter decision)
Manual end-to-end (real daemon)
Expected:
SessionEvent/OtelTracemetrics emitted; watermark not advanced.🤖 Generated with Claude Code