Skip to content

feat: add GIT_AI_INTERNAL_DIR to isolate per-machine git-ai state on shared NFS homes#1650

Open
jwiegley wants to merge 6 commits into
mainfrom
johnw/git-ai-dir
Open

feat: add GIT_AI_INTERNAL_DIR to isolate per-machine git-ai state on shared NFS homes#1650
jwiegley wants to merge 6 commits into
mainfrom
johnw/git-ai-dir

Conversation

@jwiegley

@jwiegley jwiegley commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a GIT_AI_INTERNAL_DIR environment variable so that multiple machines sharing an NFS home directory can each confine git-ai's volatile, lock-prone state to a machine-local directory — eliminating collisions on the SQLite databases, daemon sockets, and lock files when running git-ai (and its background daemon) concurrently across hosts.

Motivation: git-ai keeps per-user runtime state under ~/.git-ai/internal/ — four WAL-mode SQLite DBs (+ -wal/-shm sidecars), the daemon control.sock/trace2.sock/daemon.lock/daemon.pid.json/logs, plus distinct_id/update_check. WAL-mode SQLite, flock, and AF_UNIX sockets are all unsafe over NFS / across hosts, so a shared home means lock contention and potential database corruption.

Behavior

When GIT_AI_INTERNAL_DIR is set to a non-empty value it is git-ai's internal directory (verbatim — no internal segment appended), replacing the default ~/.git-ai/internal. Each host points it at a machine-local path. Unset / empty / whitespace-only preserves current behavior byte-for-byte.

Relocated when set: the 4 SQLite DBs (db, metrics-db, notes-db, transcripts-db), daemon sockets/lock/pid/logs, distinct_id, update_check, checkpoint-debug-logs, debug-self-checks, and the diagnostics trace2 self-check files.

Intentionally stays shared in ~/.git-ai (so login/config carry across machines): config.json, OAuth credentials, and skills/. Per-repo .git/ai/ working logs are unaffected (already per-repo).

Implementation

  • config::internal_dir_path() honors the var via a single git_ai_internal_dir_env_override() parser (empty / whitespace-only ⇒ unset).
  • The three DB modules that resolved their path directly off dirs::home_dir() (authorship/internal_db.rs, metrics/db.rs, notes/db.rs) now route through internal_dir_path(); their cfg(test) per-DB path overrides still take precedence. transcripts-db is centralized in DaemonConfig::transcripts_db_path(), used by both the daemon and the tests.
  • DaemonConfig::from_env_or_default_paths() makes GIT_AI_INTERNAL_DIR the highest-precedence internal-dir source (> GIT_AI_DAEMON_HOME > default); the explicit socket overrides still apply on top.
  • The detached daemon inherits the var by ordinary environment propagation (the spawn never env_clear()s nor sanitizes it), so the client, git proxy, and daemon resolve identical socket/lock/DB paths.
  • diagnostics.rs (trace2 self-check) and diagnostic_sentinels.rs (debug_self_check_root) route through internal_dir_path() too.
  • The test-only GIT_AI_TEST_COMPLETION_LOG_DIR daemon signal is cfg(any(test, feature = "test-support"))-gated so a production release binary cannot be coerced into daemon test mode by an env var (the test daemon is built with --features test-support, so it still honors it).

Tests

tests/integration/internal_dir.rs + per-module unit tests demonstrate the requirement on three levels, all grounded in the production resolvers (internal_dir_path(), DaemonConfig::from_default_paths(), each DB's database_path()) under each machine's GIT_AI_INTERNAL_DIR — so they fail if the variable is ever ignored — and hermetic against ambient per-DB test overrides:

  • Collision when unset / shared — identical resolved paths, a mutually-exclusive daemon.lock, shared DB files.
  • Isolation when set per machine — pairwise- and globally-disjoint paths for two and three machines, coexisting locks, isolated DB writes.
  • End-to-end concurrency — two and three real daemons (simulated hosts), each with its own GIT_AI_INTERNAL_DIR, driving real commit/authorship flows concurrently with correct line-level authorship on each; the collision case where a second daemon on a shared dir cannot start; and the detached-auto-spawn env-inheritance invariant (cross-platform, incl. Windows named pipes).
  • Per-DB path rerouting + override precedence; empty/whitespace handling.

Also includes a small unrelated fix: the api-key tests in config_fresh_test.rs now clear an ambient GIT_AI_API_KEY so they are hermetic (a pre-existing test-isolation gap surfaced during the full-suite run; it never affected CI's clean environment).

Verification

Full CI matrix green — Ubuntu, macOS, and all six Windows shards, plus Lint / Format / Doc / CodeQL / CLA / Snyk. Local: task build / task fmt / task lint clean; full suite 5151 tests pass.

🤖 Generated with Claude Code

https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@jwiegley jwiegley requested a review from svarlamov June 24, 2026 22:44
jwiegley and others added 6 commits July 1, 2026 17:34
When multiple machines share an NFS home directory, git-ai's per-user
runtime state under ~/.git-ai/internal -- the four WAL-mode SQLite
databases, the daemon control/trace sockets, daemon.lock, and pid file --
collides across hosts, risking lock contention and database corruption.

Add a GIT_AI_INTERNAL_DIR environment variable: when set to a non-empty
value it IS git-ai's internal directory (verbatim), replacing the default
~/.git-ai/internal. Each host points it at a machine-local path so the
hosts never share a database, socket, or lock. Unset / empty /
whitespace-only preserves the default behavior byte-for-byte.

Implementation:
- config::internal_dir_path() honors the var via a shared
  git_ai_internal_dir_env_override() parser (empty/whitespace => unset).
- The three DB modules that resolved their path directly off
  dirs::home_dir() (authorship db, metrics-db, notes-db) now route through
  internal_dir_path(); their cfg(test) per-DB path overrides still win.
- DaemonConfig::from_env_or_default_paths() makes GIT_AI_INTERNAL_DIR the
  highest-precedence internal-dir source (> GIT_AI_DAEMON_HOME > default);
  explicit socket overrides still apply on top.
- The detached daemon inherits the var by ordinary environment
  propagation (the spawn never env_clear()s it nor sanitizes it), so the
  client, git proxy, and daemon resolve identical socket/lock/DB paths.
- The diagnostics trace2 self-check and debug-self-check root now route
  through internal_dir_path() so all in-scope runtime state relocates.

Credentials, config.json, and skills stay shared in ~/.git-ai by design,
so login and configuration remain shared across machines.

Tests (tests/integration/internal_dir.rs + per-module unit tests):
- collision when unset/shared: identical resolved paths, a mutually
  exclusive daemon lock, and shared DB files;
- isolation when set per machine: pairwise- and globally-disjoint paths
  for two AND three machines, coexisting locks, isolated DB writes;
- end-to-end concurrency with two and three real daemons driving real
  attribution flows with correct line-level authorship after every commit;
- env inheritance on detached daemon auto-spawn;
- per-DB path rerouting and per-DB override precedence.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…ion resolution

A fess honesty-audit flagged that two path-only integration tests did not
exercise the real resolver:
- test_internal_dir_shared_machines_resolve_identical_colliding_paths
  asserted f(x) == f(x) (determinism), not that the env var is honored.
- test_internal_dir_distinct_db_writes_are_isolated wrote to the test's own
  hardcoded paths instead of resolving through internal_dir_path(), so it
  would have passed even if GIT_AI_INTERNAL_DIR were ignored.

Rewrite both to drive the production resolvers internal_dir_path() and
DaemonConfig::from_default_paths() under the env guard and assert the
resolved paths equal the configured machine-local dir, so each test now
fails if the variable is ever ignored. Also cover the previously untested
whitespace-only value (claimed equivalent to unset) in
test_internal_dir_empty_or_whitespace_value_is_treated_as_unset.

Test-only change; build / fmt / lint clean, 14 integration + 9 unit tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
test_config_fresh_picks_up_api_key_changes and
test_api_context_picks_up_api_key_changes build a temp HOME with an empty
config and assert api_key().is_none(), but Config resolves the API key from
the GIT_AI_API_KEY env var ahead of the on-disk config. On a machine where
that var is exported (e.g. a developer's shell) the ambient key leaks in and
both tests fail; in CI's clean environment they pass. This is a pre-existing
test-isolation gap, unrelated to the GIT_AI_INTERNAL_DIR work.

Add an ApiKeyEnvGuard (mirroring the existing HomeEnvGuard) that clears
GIT_AI_API_KEY for the duration of these two tests and restores it on drop,
so they exercise file-config behavior hermetically regardless of the ambient
environment.

Verified: both tests now pass with GIT_AI_API_KEY set in the environment;
build / fmt / lint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…(Windows named pipes)

test_internal_dir_env_is_used_verbatim_for_all_runtime_paths asserted that
every resolved runtime path is "under the internal dir, or the Unix hashed
temp fallback". On Windows the daemon control/trace sockets are named pipes
(\\.\pipe\git-ai-<hash>-...) -- derived from a hash of the internal dir but
not filesystem paths under it -- so the predicate was false and the test
failed deterministically on Windows (CI: Test on windows-latest integration
4/6, 592 passed / 1 failed). This was previously and incorrectly attributed to
the unrelated 0xC0000142 process-spawn flake seen in shard 6/6.

Accept the Windows named-pipe form as a third valid "derived from the internal
dir" shape. The `== canonical` assertions just above already prove the client
and daemon agree on the exact derived paths on every platform; this predicate
is the looser sanity check that needed the Windows case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…production resolvers

Addresses three fess-audit findings.

#2 (production safety): gate the GIT_AI_TEST_COMPLETION_LOG_DIR test-sync signal
behind cfg(any(test, feature = "test-support")) in daemon.rs, so a production
RELEASE binary (built without test-support) can no longer be flipped into daemon
test mode by an env var. The test daemon is built with --features test-support,
so it still honors the signal (verified: e2e + default-mode completion-log sync
all pass). Extracted the completion-log-dir resolution into
resolve_test_completion_log_dir() (returns None in production).

#3 (vacuous / fixture-drift tests): the verbatim test asserted
db_paths(&dir)[0] == dir.join("db") -- the test comparing its own arithmetic to
itself. Add #[cfg(feature = "test-support")] database_path_for_test() wrappers to
InternalDatabase / MetricsDatabase / NotesDatabase and resolve all test DB and
runtime paths through the real production resolvers (database_path() +
DaemonConfig::from_default_paths()) under each machine's GIT_AI_INTERNAL_DIR,
hermetic against ambient per-DB test overrides. The collision / isolation /
disjointness tests (two AND three machines) are now env-grounded: they fail if
the variable is ever ignored.

#4 (fragile / doc): tighten the ApiKeyEnvGuard doc; replace the fragile
`git-ai usage` side-effect metrics-db assertion in the e2e tests with the
now-deterministic verbatim/unit-test routing proof; fix module-doc drift.

build / fmt / lint clean; 14 integration + 9 unit internal_dir tests pass; daemon
regression smoke green (one pre-existing 500ms-timeout timing flake,
daemon_partial_trace_line_..., passes 3/3 in isolation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
…-db e2e witness

Two residuals from the adversarial review of cd85206:

- transcripts-db was still self-compared: the test reconstructed the literal
  internal_dir.join("transcripts-db") and asserted it equals the same literal, so
  a production rename would not be caught (unlike the other three DBs, which route
  through database_path()). Centralize the literal in
  DaemonConfig::transcripts_db_path(), used by BOTH the daemon's open path
  (daemon.rs) and the test helpers, so the assertion now fails on a rename.

- Restore one end-to-end metrics-db check (single-machine e2e) via `git-ai usage`
  as the subprocess-env-propagation witness, with a comment documenting the
  dependency. The deterministic routing proof remains in the verbatim/unit tests,
  so this is no longer the sole (fragile) proof -- it adds back the
  client-subprocess coverage the prior commit dropped.

build / fmt / lint clean; 14 integration + 9 unit internal_dir tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NQ4uZG1GGWWCTbND93jJuR
@jwiegley jwiegley force-pushed the johnw/git-ai-dir branch from a5c2a9f to faf840d Compare July 1, 2026 17:34

jwiegley commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 new potential issues.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant