feat: add GIT_AI_INTERNAL_DIR to isolate per-machine git-ai state on shared NFS homes#1650
Open
jwiegley wants to merge 6 commits into
Open
feat: add GIT_AI_INTERNAL_DIR to isolate per-machine git-ai state on shared NFS homes#1650jwiegley wants to merge 6 commits into
jwiegley wants to merge 6 commits into
Conversation
When multiple machines share an NFS home directory, git-ai's per-user runtime state under ~/.git-ai/internal -- the four WAL-mode SQLite databases, the daemon control/trace sockets, daemon.lock, and pid file -- collides across hosts, risking lock contention and database corruption. Add a GIT_AI_INTERNAL_DIR environment variable: when set to a non-empty value it IS git-ai's internal directory (verbatim), replacing the default ~/.git-ai/internal. Each host points it at a machine-local path so the hosts never share a database, socket, or lock. Unset / empty / whitespace-only preserves the default behavior byte-for-byte. Implementation: - config::internal_dir_path() honors the var via a shared git_ai_internal_dir_env_override() parser (empty/whitespace => unset). - The three DB modules that resolved their path directly off dirs::home_dir() (authorship db, metrics-db, notes-db) now route through internal_dir_path(); their cfg(test) per-DB path overrides still win. - DaemonConfig::from_env_or_default_paths() makes GIT_AI_INTERNAL_DIR the highest-precedence internal-dir source (> GIT_AI_DAEMON_HOME > default); explicit socket overrides still apply on top. - The detached daemon inherits the var by ordinary environment propagation (the spawn never env_clear()s it nor sanitizes it), so the client, git proxy, and daemon resolve identical socket/lock/DB paths. - The diagnostics trace2 self-check and debug-self-check root now route through internal_dir_path() so all in-scope runtime state relocates. Credentials, config.json, and skills stay shared in ~/.git-ai by design, so login and configuration remain shared across machines. Tests (tests/integration/internal_dir.rs + per-module unit tests): - collision when unset/shared: identical resolved paths, a mutually exclusive daemon lock, and shared DB files; - isolation when set per machine: pairwise- and globally-disjoint paths for two AND three machines, coexisting locks, isolated DB writes; - end-to-end concurrency with two and three real daemons driving real attribution flows with correct line-level authorship after every commit; - env inheritance on detached daemon auto-spawn; - per-DB path rerouting and per-DB override precedence. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…ion resolution A fess honesty-audit flagged that two path-only integration tests did not exercise the real resolver: - test_internal_dir_shared_machines_resolve_identical_colliding_paths asserted f(x) == f(x) (determinism), not that the env var is honored. - test_internal_dir_distinct_db_writes_are_isolated wrote to the test's own hardcoded paths instead of resolving through internal_dir_path(), so it would have passed even if GIT_AI_INTERNAL_DIR were ignored. Rewrite both to drive the production resolvers internal_dir_path() and DaemonConfig::from_default_paths() under the env guard and assert the resolved paths equal the configured machine-local dir, so each test now fails if the variable is ever ignored. Also cover the previously untested whitespace-only value (claimed equivalent to unset) in test_internal_dir_empty_or_whitespace_value_is_treated_as_unset. Test-only change; build / fmt / lint clean, 14 integration + 9 unit tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
test_config_fresh_picks_up_api_key_changes and test_api_context_picks_up_api_key_changes build a temp HOME with an empty config and assert api_key().is_none(), but Config resolves the API key from the GIT_AI_API_KEY env var ahead of the on-disk config. On a machine where that var is exported (e.g. a developer's shell) the ambient key leaks in and both tests fail; in CI's clean environment they pass. This is a pre-existing test-isolation gap, unrelated to the GIT_AI_INTERNAL_DIR work. Add an ApiKeyEnvGuard (mirroring the existing HomeEnvGuard) that clears GIT_AI_API_KEY for the duration of these two tests and restores it on drop, so they exercise file-config behavior hermetically regardless of the ambient environment. Verified: both tests now pass with GIT_AI_API_KEY set in the environment; build / fmt / lint clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…(Windows named pipes) test_internal_dir_env_is_used_verbatim_for_all_runtime_paths asserted that every resolved runtime path is "under the internal dir, or the Unix hashed temp fallback". On Windows the daemon control/trace sockets are named pipes (\\.\pipe\git-ai-<hash>-...) -- derived from a hash of the internal dir but not filesystem paths under it -- so the predicate was false and the test failed deterministically on Windows (CI: Test on windows-latest integration 4/6, 592 passed / 1 failed). This was previously and incorrectly attributed to the unrelated 0xC0000142 process-spawn flake seen in shard 6/6. Accept the Windows named-pipe form as a third valid "derived from the internal dir" shape. The `== canonical` assertions just above already prove the client and daemon agree on the exact derived paths on every platform; this predicate is the looser sanity check that needed the Windows case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…production resolvers Addresses three fess-audit findings. #2 (production safety): gate the GIT_AI_TEST_COMPLETION_LOG_DIR test-sync signal behind cfg(any(test, feature = "test-support")) in daemon.rs, so a production RELEASE binary (built without test-support) can no longer be flipped into daemon test mode by an env var. The test daemon is built with --features test-support, so it still honors the signal (verified: e2e + default-mode completion-log sync all pass). Extracted the completion-log-dir resolution into resolve_test_completion_log_dir() (returns None in production). #3 (vacuous / fixture-drift tests): the verbatim test asserted db_paths(&dir)[0] == dir.join("db") -- the test comparing its own arithmetic to itself. Add #[cfg(feature = "test-support")] database_path_for_test() wrappers to InternalDatabase / MetricsDatabase / NotesDatabase and resolve all test DB and runtime paths through the real production resolvers (database_path() + DaemonConfig::from_default_paths()) under each machine's GIT_AI_INTERNAL_DIR, hermetic against ambient per-DB test overrides. The collision / isolation / disjointness tests (two AND three machines) are now env-grounded: they fail if the variable is ever ignored. #4 (fragile / doc): tighten the ApiKeyEnvGuard doc; replace the fragile `git-ai usage` side-effect metrics-db assertion in the e2e tests with the now-deterministic verbatim/unit-test routing proof; fix module-doc drift. build / fmt / lint clean; 14 integration + 9 unit internal_dir tests pass; daemon regression smoke green (one pre-existing 500ms-timeout timing flake, daemon_partial_trace_line_..., passes 3/3 in isolation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…-db e2e witness Two residuals from the adversarial review of cd85206: - transcripts-db was still self-compared: the test reconstructed the literal internal_dir.join("transcripts-db") and asserted it equals the same literal, so a production rename would not be caught (unlike the other three DBs, which route through database_path()). Centralize the literal in DaemonConfig::transcripts_db_path(), used by BOTH the daemon's open path (daemon.rs) and the test helpers, so the assertion now fails on a rename. - Restore one end-to-end metrics-db check (single-machine e2e) via `git-ai usage` as the subprocess-env-propagation witness, with a comment documenting the dependency. The deterministic routing proof remains in the verbatim/unit tests, so this is no longer the sole (fragile) proof -- it adds back the client-subprocess coverage the prior commit dropped. build / fmt / lint clean; 14 integration + 9 unit internal_dir tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
a5c2a9f to
faf840d
Compare
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds a
GIT_AI_INTERNAL_DIRenvironment variable so that multiple machines sharing an NFS home directory can each confine git-ai's volatile, lock-prone state to a machine-local directory — eliminating collisions on the SQLite databases, daemon sockets, and lock files when running git-ai (and its background daemon) concurrently across hosts.Motivation: git-ai keeps per-user runtime state under
~/.git-ai/internal/— four WAL-mode SQLite DBs (+-wal/-shmsidecars), the daemoncontrol.sock/trace2.sock/daemon.lock/daemon.pid.json/logs, plusdistinct_id/update_check. WAL-mode SQLite,flock, and AF_UNIX sockets are all unsafe over NFS / across hosts, so a shared home means lock contention and potential database corruption.Behavior
When
GIT_AI_INTERNAL_DIRis set to a non-empty value it is git-ai's internal directory (verbatim — nointernalsegment appended), replacing the default~/.git-ai/internal. Each host points it at a machine-local path. Unset / empty / whitespace-only preserves current behavior byte-for-byte.Relocated when set: the 4 SQLite DBs (
db,metrics-db,notes-db,transcripts-db), daemon sockets/lock/pid/logs,distinct_id,update_check, checkpoint-debug-logs, debug-self-checks, and the diagnostics trace2 self-check files.Intentionally stays shared in
~/.git-ai(so login/config carry across machines):config.json, OAuthcredentials, andskills/. Per-repo.git/ai/working logs are unaffected (already per-repo).Implementation
config::internal_dir_path()honors the var via a singlegit_ai_internal_dir_env_override()parser (empty / whitespace-only ⇒ unset).dirs::home_dir()(authorship/internal_db.rs,metrics/db.rs,notes/db.rs) now route throughinternal_dir_path(); theircfg(test)per-DB path overrides still take precedence.transcripts-dbis centralized inDaemonConfig::transcripts_db_path(), used by both the daemon and the tests.DaemonConfig::from_env_or_default_paths()makesGIT_AI_INTERNAL_DIRthe highest-precedence internal-dir source (>GIT_AI_DAEMON_HOME> default); the explicit socket overrides still apply on top.env_clear()s nor sanitizes it), so the client, git proxy, and daemon resolve identical socket/lock/DB paths.diagnostics.rs(trace2 self-check) anddiagnostic_sentinels.rs(debug_self_check_root) route throughinternal_dir_path()too.GIT_AI_TEST_COMPLETION_LOG_DIRdaemon signal iscfg(any(test, feature = "test-support"))-gated so a production release binary cannot be coerced into daemon test mode by an env var (the test daemon is built with--features test-support, so it still honors it).Tests
tests/integration/internal_dir.rs+ per-module unit tests demonstrate the requirement on three levels, all grounded in the production resolvers (internal_dir_path(),DaemonConfig::from_default_paths(), each DB'sdatabase_path()) under each machine'sGIT_AI_INTERNAL_DIR— so they fail if the variable is ever ignored — and hermetic against ambient per-DB test overrides:daemon.lock, shared DB files.GIT_AI_INTERNAL_DIR, driving real commit/authorship flows concurrently with correct line-level authorship on each; the collision case where a second daemon on a shared dir cannot start; and the detached-auto-spawn env-inheritance invariant (cross-platform, incl. Windows named pipes).Also includes a small unrelated fix: the api-key tests in
config_fresh_test.rsnow clear an ambientGIT_AI_API_KEYso they are hermetic (a pre-existing test-isolation gap surfaced during the full-suite run; it never affected CI's clean environment).Verification
Full CI matrix green — Ubuntu, macOS, and all six Windows shards, plus Lint / Format / Doc / CodeQL / CLA / Snyk. Local:
task build/task fmt/task lintclean; full suite 5151 tests pass.🤖 Generated with Claude Code
https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR