Add file system mounting support (FUSE, Cloud Files, NFS, FileProvider)#296
Open
chipsenkbeil wants to merge 197 commits intomasterfrom
Open
Add file system mounting support (FUSE, Cloud Files, NFS, FileProvider)#296chipsenkbeil wants to merge 197 commits intomasterfrom
chipsenkbeil wants to merge 197 commits intomasterfrom
Conversation
Reduce 6 file I/O request variants (FileRead, FileReadText, FileWrite, FileWriteText, FileAppend, FileAppendText) to 2 (FileRead, FileWrite) with options structs. This is groundwork for the file mount feature which needs offset-based reads and writes. - Add ReadFileOptions (offset, len) and WriteFileOptions (offset, append) - Consolidate Api trait from 6 methods to 2 with options parameters - Update all three backends (host, ssh, docker) with options handling - ChannelExt keeps convenience wrappers (read_file_text, append_file, etc.) - Update all tests across the workspace
New crate providing the shared infrastructure for mounting remote filesystems locally. This includes: - InodeTable: bidirectional inode<->RemotePath mapping with ref counting and LRU eviction (54 unit tests) - Cache layer: AttrCache, DirCache, ReadCache with TTL + LRU eviction - WriteBuffer: per-file write-back buffer with dirty range tracking - RemoteFs: translation layer from filesystem ops to ChannelExt calls - MountConfig/MountHandle: configuration and lifecycle management - Backend module skeleton for platform-specific implementations The mount feature is opt-in (not in default features) since it requires platform-specific libraries (macFUSE/libfuse).
Implements the FUSE backend for Linux, FreeBSD, and macOS using fuser: - FuseHandler implements fuser::Filesystem, delegating all callbacks to RemoteFs (lookup, getattr, read, write, readdir, create, mkdir, unlink, rmdir, rename, flush, fsync, release, forget) - Public mount() entry point creates RemoteFs and spawns background FUSE session - CLI `distant mount <MOUNT_POINT>` command with options for remote-root, readonly, cache TTLs - CLI `distant unmount <MOUNT_POINT>` for clean unmount The mount feature is opt-in and requires macFUSE (macOS), libfuse (Linux), or FUSE (FreeBSD) to be installed.
Implements the Cloud Filter API backend for native File Explorer integration on Windows 10+ using the cloud-filter crate: - CloudFilesHandler implements SyncFilter with fetch_data (on-demand hydration), fetch_placeholders (lazy directory population), deleted, and renamed callbacks - Sync root registration/deregistration lifecycle - Placeholder files appear natively in File Explorer The windows-cloud-files feature requires Windows 10+ and the cloud-filter crate. Exact API compatibility will be verified when first compiled on Windows.
Implements a localhost NFSv3 server using the nfsserve crate that translates NFS operations to distant API calls: - NfsHandler implements NFSFileSystem trait (lookup, getattr, read, write, readdir, create, remove, rename, mkdir, setattr) - Server binds to 127.0.0.1 on a random port - OS-native mount_nfs command attaches the server - Platform-specific mount commands for OpenBSD, NetBSD, Linux, macOS, and FreeBSD The nfs feature is available on all Unix platforms but the mount() entry point is gated to OpenBSD/NetBSD where FUSE is unavailable.
Documents the architecture for the future FileProvider backend which requires a .appex inside a .app bundle (hard Apple requirement): - DistantMount.app: headless container app (LSBackgroundOnly=true) - DistantFileProvider.appex: NSFileProviderReplicatedExtension - IPC via App Group shared container for connection credentials - Build via shell script (not Xcode), code signed with codesign The macos-file-provider feature flag is defined but has no dependencies yet. Implementation requires the objc2 ecosystem crates and the .app bundle infrastructure.
Full implementation of NSFileProviderReplicatedExtension using objc2 and objc2-file-provider for native Finder integration on macOS 12+: - DistantFileProvider: implements NSFileProviderReplicatedExtension with initWithDomain, invalidate, itemForIdentifier, fetchContents, createItem, modifyItem, deleteItem - DistantFileProviderItem: implements NSFileProviderItemProtocol with itemIdentifier, parentItemIdentifier, filename - DistantFileProviderEnumerator: implements NSFileProviderEnumerator with invalidate and enumerateItemsForObserver - Global RemoteFs access via OnceLock for cross-process state - mount_file_provider() public API for CLI --backend selection - Added get_path/get_ino_for_path helpers to RemoteFs Requires .appex inside .app bundle (Apple requirement). The macos-file-provider feature enables the objc2 dependency chain.
…nd macOS bundle Replace the single `mount` feature with per-backend features (mount-fuse, mount-nfs, mount-windows-cloud-files, mount-macos-file-provider) defaulting to mount-nfs since nfsserve is pure Rust and compiles everywhere. Add MountBackend enum with cfg-gated variants, Default/Display/FromStr impls, and unified mount() dispatch. The CLI exposes --backend to select a backend explicitly; the default auto-selects based on platform and whether the binary is running inside a .app bundle. Add watch-based cache invalidation in RemoteFs: Arc-wrap caches, spawn a best-effort watch task that invalidates attr/dir/read caches on remote filesystem changes, falling back to TTL-only when watch is unsupported. Add macOS FileProvider .app bundle infrastructure: Info.plist files, entitlements (production + dev), build-macos-bundle.sh script, extension entry point with NSBundle-based detection, and App Group-aware socket path in constants.rs.
Include mount-fuse, mount-nfs, mount-windows-cloud-files, and mount-macos-file-provider in the LONG_VERSION feature list shown by --version.
App Sandbox with ad-hoc codesigning causes macOS to kill the process on launch. Remove sandbox and network.client entitlements (not needed for local dev), keep testing-mode for FileProvider, and add get-task-allow for debugger support.
Ad-hoc codesigning with entitlements produces signatures that Gatekeeper rejects in /Applications. Default to no entitlements for ad-hoc (local dev); set ENTITLEMENTS env var for distribution builds with a real signing identity.
Use PossibleValuesParser with MountBackend::available_backends() so clap lists the compiled-in backends as possible values for the --backend flag.
RemoteFs::new() uses rt.block_on(system_info()) to resolve the default remote root, which panics when called from within a tokio runtime. Resolve system_info in the async CLI handler and pass the result via MountConfig::remote_root so block_on is never needed.
Add needs_foreground flag to MountHandle. NFS/FUSE/Cloud Files backends need a foreground process (server stays alive), so the CLI blocks on Ctrl+C for those. FileProvider registers a persistent domain and exits immediately — macOS manages the .appex lifecycle.
When macOS launches the .appex (cold boot, Finder access), it calls initWithDomain: on DistantFileProvider. The extension now reads domain metadata (connection_id, destination) from NSUserDefaults, calls a resolver callback to get a Channel from the distant manager, and creates a RemoteFs for all subsequent file operations. - distant-mount: add init(), bootstrap(), ChannelResolver type, and TOKIO_HANDLE/CHANNEL_RESOLVER globals; modify initWithDomain to call bootstrap(); register_domain now persists metadata in NSUserDefaults and calls addDomain - binary crate: run_extension() creates tokio runtime, inits file logging, registers channel resolver; resolve_connection() tries stored ID then falls back to destination search; connect_headless() uses DummyAuthHandler with exponential backoff - MountConfig gains extra: Map field for backend-specific data
The .appex was invisible to fileproviderd due to missing Info.plist keys and broken code signing. Extension-Info.plist: - Add CFBundlePackageType (XPC!), CFBundleInfoDictionaryVersion, CFBundleShortVersionString, CFBundleSupportedPlatforms, CFBundleDisplayName — required for pluginkit discovery - Add NSExtensionFileProviderDocumentGroup — required for fileproviderd to associate the extension with its group container Entitlements (split app vs appex): - distant-dev.entitlements: application-groups, network.client, get-task-allow (app needs group access for NSUserDefaults write) - distant-appex-dev.entitlements: same plus app-sandbox and fileprovider.testing-mode (appex requires sandbox for launch) - Note: restricted entitlements require a development certificate; ad-hoc signing will register the domain but AMFI blocks the appex launch. Sign with CODESIGN_IDENTITY="Apple Development". Build script: - Use hardlink instead of symlink (codesign rejects symlinked executables) - Separate ENTITLEMENTS and APPEX_ENTITLEMENTS variables - Default to dev entitlements instead of none
- Remove stale domain (same ID) before addDomain to handle re-mounts - Include connection_id in display name so multiple connections to the same host get unique CloudStorage folder paths - Add APP_PROFILE / APPEX_PROFILE support to build script for embedding provisioning profiles - Remove fileprovider.testing-mode from appex entitlements (requires separate Apple approval; not needed with proper provisioning)
…support Replace metadata-file-based domain enumeration with macOS getDomainsWithCompletionHandler and removeAllDomainsWithCompletionHandler APIs so orphaned domains (whose metadata was lost) are properly cleaned up. Remove symlink mount_point support — macOS manages the CloudStorage folder path, so FileProvider mounts now reject a mount_point argument.
…oup.dev.distant and remove dev entitlements
…nection info using json, and add log_level as option for macos provider via connection info with trace logging for implementation
Replace the broken ThreadedRemoteFs with a new Runtime struct that bridges sync backend callbacks (FUSE, FileProvider) to async RemoteFs operations via tokio Handle + OnceCell + watch channel. - Runtime::new() for lazy init (.appex bootstrap path) - Runtime::with_fs() for eager init (normal mount path) - Runtime::spawn() dispatches async work, waits for init Update FUSE backend for fuser 0.17 API (INodeNo, Errno, FileHandle, Generation, etc.) and dispatch all callbacks through Runtime::spawn(). Update FileProvider backend with UnsafeSendable<T> wrapper for !Send Apple types, async dispatch via Runtime::spawn(), and proper module imports across provider/enumerator/item hierarchy. Fix binary crate integration: missing .await on mount(), refactor macos_file_provider.rs into macos_appex.rs, fix PossibleValuesParser.
…uff appropriately; fix warnings about dead code
The FileProvider framework validates items at runtime and aborts if itemVersion is missing. Add itemVersion (mtime-based), capabilities (read/write/delete/rename/reparent), and enumerator sync anchor stubs to suppress the degraded-performance warnings. Also add logs-appex.sh script for quick appex log inspection.
- Handle NSFileProviderWorkingSetContainerItemIdentifier (primary blocker for "Loading..." in Finder) and trash container by returning empty results immediately - Fix root container handling in itemForIdentifier: use framework constant as item identifier with "/" filename (empty filename caused CRIT: missing filename crash) - Fix parent identifiers everywhere: use root constant when parent ino=1 instead of hardcoded "1" (enumerator, fetchContents, modifyItem, createItem) - Add resolve_parent_identifier helper for consistent parent ID resolution across all handlers - Fix createItem to read file content from URL and write to remote (was creating empty files), use conformsToType for dir detection - Signal bootstrap failures to Finder via NSFileProviderErrorCode:: ServerUnreachable instead of returning empty results - Add make_fp_error for proper NSFileProviderErrorDomain errors - Add "Distant — " prefix to domain display names in Finder sidebar - Clean up /tmp/distant_fp_* temp files on unmount - Fix logs-appex.sh to check App Group container path - Add channel resolver outcome logging in appex entry point - Complete structured logging for enumerateChanges
Lists registered FileProvider domains with their metadata: domain identifier, display name, metadata file presence, and destination. Supports --format shell (table) and --format json output. Exposes list_file_provider_domains() and DomainInfo from distant-mount for the binary crate to use.
Skip remove_domain_blocking when domain doesn't exist — calling removeDomain on a non-existent domain causes fileproviderd to unregister the extension entirely, preventing the appex from launching. Now checks get_all_domains() first and only removes if the domain ID is found. Result: appex now launches and bootstraps successfully. 20/37 FP tests pass (up from 11). Remaining 17 failures are "No such file or directory" (enumeration timing) rather than "Operation timed out" (connectivity).
- signal_enumerator_for_domain: async function calls NSFileProviderManager.signalEnumerator after bootstrap + cache warm, telling macOS to enumerate now that the appex is connected - wait_for_fp_mount_ready: polls read_dir until mount is accessible (restores implicit wait removed with discover_cloud_storage_entry) - enumerateChanges returns SyncAnchorExpired to force fresh enumeration on every access (remote FS has no change tracking) 20/37 FP tests pass. Remaining 17 fail because tests seed data after mount and macOS caching prevents immediate visibility — requires the disconnect/reconnect architecture (TODO #9) for proper fix.
Production: - Signal enumerator targets working set (was root — only working set works) - Working set enumerator returns root items (was empty) - enumerateChanges returns syncAnchorExpired to force fresh enumeration - Configurable poll_interval via MountConfig.extra (default 5s) - FP mount spawns background task that signals working set periodically - Mount CLI gets --extra key=value flag for backend-specific config Test infrastructure: - FP mounts use poll_interval=0.05 (50ms) for fast refresh - wait_for_fp_path helper polls local path until visible (10s timeout) 21/37 FP tests pass. Remaining 16 need wait_for_fp_path calls in tests.
Added mount::wait_for_path(mount_backend, path) — polls local path for FP mounts (no-op for NFS/FUSE/WCF). FileProvider refreshes directory listings asynchronously via the manager's 50ms poll_interval in tests. Tests now wait for the FP refresh before reading through the mount. 37/37 FP tests pass. Combined with NFS/FUSE/Docker: 228/228 total.
…plan Move docs/mount-tests-PRD.md → PRD.md and docs/mount-tests-progress.md → PROGRESS.md so they live alongside README.md as canonical project references. Correct the stale "9 FP failures remain" status — the FP suite has been at 37/37 since 86d794d, total at 228/228. Embed the full Network Resilience + Mount Health plan into PRD.md as a new "Plan: Network Resilience + Mount Health" section so the plan survives context compaction. PROGRESS.md gains an "Active plan" pointer at the top plus a Phase 0 checklist (0a–0j) tracking the incorporation of PR #288 and a Phase 1–6 checklist for the mount health work that follows. Update .claude/commands/mount-test-loop.md to point at the new top-level paths.
…epalive Move socket2 from unix-only to cross-platform dependencies and configure SO_KEEPALIVE (15s idle, 5s probe interval) on every TCP stream owned by distant: client connect, transport reconnect, and listener accept. Cherry-picked from #288 (commit 61e48c0). Address review comment 2933801998 ("Does this one function need to be pulled up to be available?") by exposing keepalive through the public TcpTransport surface instead of a `pub(crate) use` re-export of a free helper: - TcpTransport gains a `set_keepalive(&self) -> io::Result<()>` method that does the SO_KEEPALIVE configuration via socket2::SockRef. - TcpTransport gains `from_accepted(stream, peer_addr) -> Self` which the listener uses to wrap accepted streams; it sets keepalive internally. - TcpListener::accept stops reaching into a private helper and just delegates to TcpTransport::from_accepted. - TcpTransport::connect / Reconnectable::reconnect call the new set_keepalive method internally so callers get keepalive for free. Keepalive failures log a warning but do not fail the connection.
Add max_heartbeat_failures (default 3) to ServerConfig. The connection loop now counts consecutive non-WouldBlock heartbeat write errors and terminates the connection when the threshold is reached. The counter resets on any successful write (heartbeat or response). Setting the value to 0 disables the feature. Backward-compatible via serde default. Cherry-picked from #288 (commit fa40953).
start_file_provider and get_or_start_file_provider in singleton.rs call crate::mount::install_test_app(), but the mount module is gated behind the mount feature. Building distant-test-harness without the mount feature (which any subset --all-features test of distant-core's plugin deps does) errored out with "could not find mount in the crate root". Add `feature = "mount"` to the existing `target_os = "macos"` cfg on both functions so they only compile when the dependency is actually available. All callers (in mount.rs) are already gated on the same feature.
Extend the Plugin trait with default reconnect() (returns Unsupported) and reconnect_strategy() (returns Fail). All three plugins override both: Host, SSH, and Docker delegate reconnect to connect and return ExponentialBackoff with backend-appropriate parameters: - Host: 3 retries / 2s base / 30s max / 60s timeout - SSH: 5 retries / 2s base / 30s max / 30s timeout - Docker: 10 retries / 1s base / 60s max / 30s timeout (slow daemon restarts deserve more patience) Cherry-picked from #288 (commit 3660e62). Strip the new separator-style test section comments per review comments 2915971580 / 2933755107 / 2933823312 (CLAUDE.md anti-pattern #11) and rename per-crate test functions to disambiguate them across crates (docker_reconnect_strategy_returns_*, ssh_reconnect_strategy_returns_*).
…ination Add ShutdownSender type and ServerRef::shutdown_sender() for lightweight shutdown signaling. The SSH backend spawns a health monitor that polls api.is_session_closed() every 2s. The Docker backend polls daemon ping and container state every 5s. When the backend dies, the health monitor triggers server shutdown, dropping the in-memory transport so the manager can detect the disconnection. Add ApiServerHandler::from_arc(Arc<T>) so the in-memory server can share its API instance with a health-monitor task that needs to query backend liveness. ChannelPool::is_closed and SshApi::is_session_closed are async on file-mount because the russh handle lives behind a tokio Mutex (added for tcpip_forward, see russh#658). The poll loop awaits both. Cherry-picked from #288 (commit 993ed8d). Resolved conflicts in distant-ssh/src/lib.rs (file-mount's tunnel state + SshApi 5-arg constructor needed to coexist with PR #288's Arc<SshApi> wrapping and ssh_health_monitor). Also added the `options` field to two test cases in distant-core/src/api.rs that the file-mount branch added since the cherry-pick base.
ManagerConnection::spawn() now clones the UntypedClient's ConnectionWatcher and optionally spawns a monitor task that sends the connection ID through a death notification channel when the connection transitions to Disconnected. ManagerServer wires this up for all connections created via connect(). For now the death-handling task in ManagerServer::new just logs the disconnect — full reconnection orchestration arrives in step 0h (adapted from PR #288 commit aa035a8). Step 0f only plumbs the watcher through. Cherry-picked from #288 (commit 594c3ca). Resolved conflicts in distant-core/src/net/manager/server.rs to keep file-mount's mount/tunnel struct fields and ManagerServer::new constructor alongside the new death_tx field.
…ations Adds the manager-side push protocol that PR #288 began but reshapes it per the review thread on #288 (comments 2933812110, 2933814790, 2933821911, 2933826601). Instead of bespoke SubscribeConnectionEvents / SubscribedConnectionEvents / ConnectionStateChanged / Reconnect / ReconnectInitiated variants, the protocol now exposes a generic three-piece API: ManagerRequest::Subscribe { topics: Vec<EventTopic> } ManagerRequest::Unsubscribe ManagerRequest::Reconnect { id: ConnectionId } ManagerResponse::Subscribed ManagerResponse::Unsubscribed ManagerResponse::Event { event: Event } ManagerResponse::ReconnectInitiated { id: ConnectionId } A new `distant-core/src/net/manager/data/event.rs` module defines: - `EventTopic { All, Connection, Mount }` — subscribers filter on topics; `All` matches every variant present and future. `Mount` is reserved (no producers yet — Phase 1 of the mount-health work ships `Event::MountState` together with the typed `MountStatus` enum). - `Event { ConnectionState { id, state } }` — a tagged event enum. Future variants (mount, tunnel, server status) plug in here without protocol additions. - `Event::topic(&self) -> EventTopic` — used by the dispatcher in step 0h to filter pushed events for clients that subscribed to specific topics. Wire shape (JSON): {"type":"subscribe","topics":["connection","mount"]} {"type":"event","event":{"type":"connection_state","id":7,"state":"reconnecting"}} To make this protocol layer fully functional in step 0h: - ConnectionState gains `Serialize`/`Deserialize` (snake_case) so it can ride the wire as part of `Event::ConnectionState`. - ReconnectStrategy::initial_sleep_duration and adjust_sleep are promoted from private to `pub` so the orchestration in 0h can drive its own retry loop. Stub handlers in `ManagerServer` return `Error` responses for the new request variants; step 0h replaces them with the real broadcast::channel + handle_reconnection wiring.
Add handle_reconnection() to orchestrate plugin reconnection when a
connection dies. The death loop in ManagerServer::new now drives this
function instead of just logging the disconnect:
1. Read the connection's destination + options under a brief read lock.
2. Look up the plugin by scheme.
3. If reconnect_strategy() is Fail, broadcast Disconnected and stop.
4. Honor the `no_reconnect` option from the CLI flag (added in 0i).
5. Broadcast Reconnecting and enter the retry loop, sleeping per the
plugin's strategy and timing each attempt against strategy.timeout().
6. On success, hot-swap the connection via ManagerConnection::replace_client
and broadcast Connected. On exhaustion, broadcast Disconnected.
Add ManagerConnection::replace_client which aborts old request /
response / monitor tasks, mints a fresh action task with a new
request_tx, and spawns a new connection monitor with the death_tx.
Existing channels are invalidated by design — callers must re-open
them after replacement.
Add NonInteractiveAuthenticator: a no-prompt Authenticator used during
background reconnection. challenge() fails with PermissionDenied
(callers using key-file or ssh-agent auth never invoke it); verify()
auto-accepts host verification because the host was verified on the
original connect.
Wire the protocol stubs from step 0g into the real implementations:
- Subscribe { topics } now spawns a forwarder task that drains the
broadcast bus, filters events by the requested topics
(EventTopic::All matches everything), and pushes
ManagerResponse::Event { event } back through the channel reply.
- Unsubscribe acks immediately. The forwarder task tied to the channel
exits naturally when the reply stream closes; per-channel teardown
while keeping the channel open is a future refinement.
- Reconnect { id } verifies the connection exists, then spawns
handle_reconnection in the background and returns ReconnectInitiated.
State transitions arrive later as Event::ConnectionState pushes.
Replace the placeholder publish helper from PR #288 with
publish_connection_state, which sends Event::ConnectionState into the
broadcast::Sender<Event> bus. ManagerServer gains an event_tx field;
broadcast::channel<Event> capacity is 16.
Cherry-picked from #288 (commit aa035a8) and
adapted to:
- Use the generic Subscribe/Event protocol from step 0g instead of
the bespoke SubscribeConnectionEvents/ConnectionStateChanged.
- Coexist with the file-mount branch's mount + tunnel struct fields
and request handlers.
- Match the existing reply_err helper and the file-mount tunnel +
mount handler ordering inside the request match.
Imports the new ConnectionState serde tests and ReconnectStrategy
{initial_sleep_duration, adjust_sleep} unit tests from the upstream
commit, with separator-style comments stripped per the review thread.
Cherry-picked from #288 (commit c40c543), adapted to use the generic Subscribe/Event protocol from step 0g instead of the bespoke SubscribeConnectionEvents helpers PR #288 originally shipped. CLI helper changes: - src/cli/common/client.rs: replace subscribe_and_display_connection_events(client, format) with subscribe_and_display_events(client, topics, format) accepting a Vec<EventTopic>. Long-running CLI commands (Shell, Api, Spawn, Ssh) now subscribe with [Connection, Mount] so a backgrounded mount drop surfaces in the same stderr/JSON stream as connection drops. JSON shape mirrors the wire format: {"type":"event","event":{"type":"connection_state",...}}. - A new display_event() helper renders each Event variant in both Format::Shell and Format::Json, ready to be extended for the Event::MountState variant that lands in Phase 1. ManagerClient API: - ManagerClient::subscribe(topics) → io::Result<Mailbox<...>>: sends Subscribe { topics }, waits for the Subscribed ack, then returns the mailbox so callers don't see the ack mixed with events. - ManagerClient::unsubscribe() → io::Result<()>: best-effort hint. - ManagerClient::reconnect(id) → io::Result<()>: sends Reconnect { id }, waits for ReconnectInitiated. The actual state transitions arrive later as Event::ConnectionState pushes on any open subscription. CLI command: - distant client reconnect <id> uses ManagerClient::reconnect. Format::Json prints {"type":"reconnect_initiated","id":<id>}; Format::Shell prints a Ui::success line.
Cherry-picked from #288 (commit a12a240). Add --no-reconnect to Connect, Launch, and Ssh client subcommands to disable automatic reconnection on connection loss. The flag is plumbed through the options Map (`no_reconnect=true`) into the manager's reconnection orchestration, where handle_reconnection checks for it before doing any work and broadcasts Disconnected straight away. Add --heartbeat-interval and --max-heartbeat-failures to the server Listen subcommand for configuring the heartbeat counter introduced in step 0c. Renamed notify_state_change → publish_connection_state at one follow-on call site that came in with this commit's no_reconnect check (the rest were renamed in step 0h).
Replace MountInfo.status: String with a typed MountStatus enum:
pub enum MountStatus {
Active,
Reconnecting,
Disconnected,
Failed { reason: String },
}
The state machine is documented inline. `Failed` is terminal — the
only exit is to unmount and remount. The vocabulary is deliberately
distinct from net::client::ConnectionState (Connected/Reconnecting/
Disconnected, no Failed) so the user can tell at a glance which
subsystem they're looking at and so mount-side terminal failures are
distinguishable from transient connection drops.
`#[serde(tag = "state", rename_all = "snake_case")]` keeps the wire
shape stable across the inner and outer state representations:
{"state":"active"}
{"state":"reconnecting"}
{"state":"disconnected"}
{"state":"failed","reason":"fuse session ended"}
Add Event::MountState { id, state: MountStatus } to the generic
event bus, with Event::topic() returning EventTopic::Mount. The
producer wires up in Phase 3 (per-mount monitor task) — Phase 1
just establishes the wire shape and the CLI rendering so the rest
of the work can plug in cleanly.
CLI changes:
- src/cli/commands/client.rs gains a format_mount_status helper
for the shell rendering of `distant status --show mount`.
Active/Reconnecting/Disconnected render as their lowercase
variant names; Failed renders as `failed: <reason>` so the
failure cause is visible in the same row as the mount.
- src/cli/common/client.rs::display_event learns the
Event::MountState variant. Shell prints
`[distant] mount N: failed (<reason>)`; JSON nests the
serialized MountStatus inside the {"type":"event","event":...}
envelope.
Round-trip tests cover both the new MountStatus serde shape and
the Event::MountState topic mapping.
Add a `probe(&self) -> MountProbe` default method to the MountHandle
trait so the manager's per-mount monitor task (Phase 3) can poll
each backend for liveness without coupling to backend-specific
internals.
pub enum MountProbe {
Healthy,
Degraded(String),
Failed(String),
}
The default impl returns `Healthy` so existing backends continue to
work without changes — Phase 4 wires up the real per-backend probes
(NFS server task alive, FUSE BackgroundSession alive, FP domain
registered + appex bootstrap, WCF watcher thread alive).
`probe` is `&self` (no `&mut`) so it can be called concurrently
with `unmount` and other operations: the monitor task locks an
Arc<Mutex<Option<Box<dyn MountHandle>>>> read-only.
Re-exported from distant_core::plugin alongside MountHandle and
MountPlugin.
Wire the generic event bus from Step 0 into mount lifecycle. Each managed mount now has a dedicated monitor task that polls the backend's MountHandle::probe (Phase 2) and reacts to Event::ConnectionState pushes for its underlying connection, publishing Event::MountState transitions through the broadcast bus. ManagedMount restructure: - info: Arc<RwLock<MountInfo>> so the monitor can update status without blocking the outer self.mounts write lock. - handle: Arc<Mutex<Option<Box<dyn MountHandle>>>> so the monitor can call probe(&self) under a brief read lock while the unmount path retains exclusive access via .lock().await.take(). - monitor: tokio::task::JoinHandle<()> aborted on unmount and on connection kill. monitor_mount task body (top of server.rs after publish helpers): - 5s tokio::time::interval (configurable via Config::mount_health_interval — defaults to DEFAULT_MOUNT_HEALTH_INTERVAL). - tokio::select! between the ticker (calls probe) and an event_rx.subscribe() receiver that drains the broadcast bus filtered for Event::ConnectionState matching the monitor's connection_id. - Two pure helpers map probe → status and connection state → status without holding any locks: probe_to_status, connection_state_to_mount_status. - Failed status is terminal — the monitor logs and exits without publishing further events. - Lagged broadcast receiver warnings are surfaced; closed bus causes the monitor to exit cleanly. publish_mount_state helper sits next to publish_connection_state. Mount handler now wraps info/handle in the new types and spawns the monitor before inserting the ManagedMount into self.mounts. Unmount handler aborts the monitor first, then takes the handle out of the Mutex via .lock().await.take() before calling handle.unmount(). If the monitor or another caller already took the handle, it logs and continues. List handler now snapshots Arc<RwLock<MountInfo>> values under the outer read lock, then locks each individual info to clone it, avoiding holding the outer lock across .await. Latent kill-leak fix: ManagerServer::kill(id) now tears down every mount whose connection_id matches. Previously, killing an SSH/Host/Docker connection that had mounts on it would orphan the mounts in the map with stale Active status — the kill code followed the tunnel-cleanup pattern but missed mounts entirely. Config gains mount_health_interval: Duration (default 5s) so the monitor poll interval is tunable per-manager. The cfg(test) test_config helper now delegates to Config::default so future Config additions don't break tests. All 2265 distant-core lib tests pass; mount integration smoke test (status_should_show_active_mount on host_nfs) confirms nothing regressed.
Wire the per-mount monitor task (Phase 3) up to real backend
liveness signals.
Add `is_alive()` to the concrete `core::MountHandle` (pub(crate)
on the wrapper side, used only by the trait impl). It returns
`true` while the outer background task has not yet completed:
catches panics and premature exits, but not finer-grained inner
failures (FUSE thread death, NFS server task panic without the
outer wrapper noticing). The doc-comment says so explicitly so
future contributors don't expect more from it.
Wire `MountHandleWrapper::probe` in distant-mount/src/plugin.rs:
- For all backends: returns `Failed("mount task ended")` if the
outer task has ended.
- For FileProvider: additionally calls
`list_file_provider_domains()` and returns
`Failed("FileProvider domain ... no longer registered")` if the
domain has disappeared from the OS-side list (e.g. user toggled
the File Providers setting). If the listing call itself errors,
returns `Degraded(...)` rather than failing the mount — the
OS API may be temporarily unavailable.
Granular per-backend signals (lifting NFS `server_task` into an
`AtomicBool`, watching `BackgroundSession.guard.is_finished()`,
checking the WCF watcher thread, etc.) are deferred. The current
coverage gives the monitor enough to react to wholesale mount
death; finer-grained probes can be added incrementally without
changing the monitor or the event bus shape.
Unit test layer for the mount health subsystem (distant-core):
- protocol::mount::mount_status_tests covers MountStatus serde:
default is Active, every variant round-trips through JSON,
Failed { reason } requires the reason field, and bogus state
values fail to parse rather than silently default.
- net::manager::server::tests grows coverage for the per-mount
monitor's pure helper functions:
- probe_to_status (8 cases): Healthy is no-op when already
Active, restores Active from Reconnecting/Disconnected, won't
revive Failed; Degraded never changes state; Failed
transitions Active to Failed with reason but doesn't
re-trigger on already-Failed.
- connection_state_to_mount_status (6 cases): Connected
restores Reconnecting/Disconnected to Active and is no-op
when already Active; Reconnecting only transitions Active;
Disconnected transitions Active and Reconnecting but not
Failed.
- publish_mount_state happy path through the broadcast bus.
- Three end-to-end monitor_mount tests with a scripted test-double
MountHandle (`ScriptedMountHandle`) that pops MountProbe values
off a shared queue:
- monitor_mount_publishes_failed_event_when_probe_returns_failed:
a single Failed probe causes both an Event::MountState
publish and an info status update to Failed.
- monitor_mount_reacts_to_connection_state_event: publishing
Event::ConnectionState::Reconnecting on the bus causes the
monitor to transition the mount to Reconnecting and publish a
matching Event::MountState.
- monitor_mount_ignores_connection_state_for_other_connection:
a ConnectionState event for a DIFFERENT connection_id leaves
the mount untouched.
CLI integration test (HLT-05):
- tests/cli/mount/health.rs adds
kill_should_remove_mounts_owned_by_connection. The test starts
an isolated host manager, finds the connection_id via
`distant status --show connection --format json`, mounts NFS on
it, kills the connection via `distant kill <id>`, then polls
`distant status --show mount` for up to 10s and asserts the
mount is gone. Without the kill-leak fix in Phase 3 the
manager's self.mounts map would still contain the mount with
stale Active status — this is a regression test for that bug.
Total: 2291 distant-core lib tests pass; HLT-05 passes against a
fresh isolated manager. EVT-* and HLT-01..04 (which require
killing sshd / connection drops in the singleton harness) are
deferred to a follow-up.
Mark all 0a–0j and Phases 1–5 boxes as complete in PROGRESS.md with the commit hashes for each. Phase 6 (this commit) is the docs roll-up itself. Update PRD.md status section to reflect 228/228 mount tests + 2291 distant-core lib tests passing, with the highlight bullet list of what landed (generic Subscribe/Event protocol, MountStatus enum, per-mount monitor, kill-leak fix, network resilience stack from PR #288, CLI flags). Document the deferred items (HLT-01..04, EVT-01..02, granular per-backend probes, process audit, Windows VM testing) so the next session has a clear list to work from. docs/CHANGELOG.md gains an Unreleased section listing every user-facing addition and the two breaking changes (`MountInfo.status` enum, `kill(id)` cleans mounts).
Capture the friction observed during the Network Resilience + Mount Health rollout (Phases 0–6, commits eb0747b → 6de03b1) and turn each incident into a concrete next-slice phase. PRD.md gains two new sections: 1. "Lessons from Phase 0–6 implementation (2026-04-07)" — the post-mortem inventory. Each subsection documents one incident, its root cause, and the phase that addresses it: - Stale singleton state was the #1 friction source (the wire-format change in Phase 1 caused silent "No mounts found" failures across every FP test until I manually pkill'd the singleton) - "No mounts found" panic messages were uninformative - Test harness compilation was fragile under feature subsets - Cherry-pick conflict resolution was lossy (separator comments, missed renames, wire-format field updates) - Tests didn't catch the orphan-mount latent bug - Background tasks vs foreground tasks vs timeouts - Build cycle was 10–30s of latency between commits - Test author boilerplate was too high (HLT-05 had two CLI subcommand typos on first attempt) - Flakes are masked by retries 2. "Plan: Test Quality & Stability" — Phases E–K with goals, agent usage, per-phase deliverables, and acceptance criteria. Phases are ordered by dependency: - Phase E: state hygiene (cleanup script, build-hash validation in singleton meta files, FP domain bulk reset) - Phase F: diagnostics (assert_mount_status! macro, singleton diagnostic dump, inline log tail in panic hook) - Phase G: test isolation (Owned-singleton scope, PID-locked sentinels, RAII tempdirs) - Phase H: coverage (wire-format fixtures, HLT-01..04 + EVT-01..02, cross-version compatibility, soak tests, per-backend probes, proptest round-trips) - Phase I: simplification (typed DistantCmd builder, fixture set, mock handles in test-harness, dev-fast profile + linker docs) - Phase J: CI (nextest profile tweaks, preflight script, test result triage) - Phase K: documentation (TESTING.md additions, CLAUDE.md test author checklist) Each phase ties back to a specific session incident, draws from industry practices (gitoxide pack format snapshots, tokio proptest codec testing, nextest retry policies), and never removes existing coverage. PROGRESS.md gains the corresponding checklist under "Phases E–K — Test Quality & Stability (next slice)" with one checkbox per sub-phase, cross-referenced to the PRD section. Phase 6 remains the final commit of the previous slice; this docs update is the bridge into the next slice.
After 30+ minutes of dedicated research into nextest internals,
ctor/dtor crates, process supervision patterns, and the actual
harness code (3 parallel agents producing 3000+ lines of
research notes archived under ~/.claude/plans/), the conclusion
is that the singleton-for-everything model is the bug.
**The earlier draft of Phases E–K is REPLACED with a smaller,
sharper plan** built around two architectural changes:
1. **Per-test ephemeral fixtures for the 80% case.** 37 of 39
mount tests don't need a singleton — they only got one
because spawning a fresh manager+server per test was slow.
With command-group + pdeathsig (Linux) + kqueue NOTE_EXIT
(macOS) + the existing tempfile RAII, per-test cost is
~100ms and SIGKILL handling is automatic.
2. **A tiny Ryuk-style sidecar reaper for the FP appex** (the
one true singleton — macOS allows one File Provider
extension instance per bundle ID per machine). Connection
lease lifecycle, schema-hash in socket path, self-heals
stale state on startup.
3. **Schema-hash baked into all singleton paths** so binaries
from different wire formats automatically use different
paths. Wire-format mismatch becomes structurally impossible
— no silent "No mounts found" failures, no manual cleanup.
**PRD.md gains:**
- "Test Architecture Today (2026-04-07)" section with 6 ASCII
diagrams: how a mount test runs today, where it breaks (10
failure modes inventoried), the orphaned-process tree on a
typical run, the test inventory by mount-source pattern, the
proposed architecture, and what changes per test category.
- "Plan: Test Quality & Stability (revised 2026-04-07)" that
REPLACES the earlier draft. New phases:
- E (50 LOC, 1 day) — `#[serde(other)]` fallback variants on
every wire enum + compile-time WIRE_SCHEMA_HASH constant
- F (30 LOC, 1 day) — schema-hash baked into singleton paths
- G (400 LOC, 2 days) — distant-test-reaper sidecar binary +
FpFixtureLease test-side struct
- H (600 LOC, 3 days) — MountedHost/MountedSsh/MountedDocker
fixtures using command-group + --watch-parent flag on
distant manager/server (Linux pdeathsig, macOS kqueue)
- I (500 LOC, 2 days) — DistantCmd builder, assert_mount_status!
macro, mock MountHandle in test-harness, dev-fast profile,
panic-hook log dump
- J (800 LOC, 4 days) — wire-format frozen fixtures, HLT-01..04
+ EVT-01..02, soak tests, per-backend probe tests, proptest
round-trips
- K (150 LOC, 1 day) — tighter nextest profile, optional test
report
- L (200 LOC, 1 day) — TESTING.md + CLAUDE.md updates
- Explicit "What we DROP from the previous draft" reconciliation
table explaining why each old item is replaced or obviated.
- Validation checklist: 5 end-to-end scenarios that must pass
without manual intervention before the refactor is complete
(idempotent reruns, SIGKILL recovery, wire-format mismatch
isolation, cargo test parity, SIGINT cleanup).
**PROGRESS.md gains** the corresponding revised checklist
under "Phases E–L", including an explicit "DROPPED from the
previous draft" section listing the eight items that don't
carry forward (cleanup scripts, build-hash sentinels,
owned-singleton opt-in, PID-locked sentinels, cross-version
compat tests, FP domain bulk reset, MountTempDir panic-hook,
preflight script).
Total estimated LOC for the revised plan: ~2730 (delta against
current harness ~+2000 net, since the per-test fixtures replace
~700 LOC of singleton machinery).
Msg<T> wraps every request/response on the wire. It was previously derived with #[serde(untagged)], which meant any failure inside T::deserialize (unknown variant, unknown field, wrong type) got collapsed to the generic "data did not match any variant of untagged enum Msg" error, hiding the real cause. Replace the derived Deserialize with a hand-written impl that dispatches via deserialize_any + Visitor: visit_seq -> Msg::Batch, visit_map -> Msg::Single. When T::deserialize fails, the real inner error propagates unchanged. Narrows Msg<T> to map/seq payloads (the only shapes used in production: Msg<Request> and Msg<Response> are internally-tagged struct enums that always serialize as maps). Two existing tests that round-tripped Msg<String> scalar payloads are updated to use a struct fixture. New failure_paths submodule adds 8 regression tests covering the real production type (Msg<protocol::Request>), deny_unknown_fields interaction, batch-element failures, and round-trip preservation across JSON and MessagePack. The on-wire bytes are unchanged - Serialize is still derived with #[serde(untagged)].
deserialize_from_slice previously returned just "Deserialize failed: <raw rmp_serde error>" with no indication of which type was being decoded or how large the payload was. When the underlying error was an untagged-enum collapse or a terse serde message, there was no way to locate the failing call site from logs. Enrich the io::Error to include std::any::type_name::<T>() and the slice length, producing messages like: "Failed to deserialize <fully::qualified::Type> from 1234 bytes: <e>" Every caller that forwards the io::Error upward (UntypedRequest / UntypedResponse decode paths, FramedTransport::read_frame_as, the authentication macro, all packet::*::from_slice helpers) automatically inherits the enriched context. Combined with sub-phase 1's custom Msg<T> Deserialize, a failing decode now tells you both which type and which specific variant failed. Also adds a doc comment with an Errors section on the helper, closing a pre-existing docs gap.
The server receive loop used to log decode failures with String::from_utf8_lossy over the raw MessagePack bytes, gated behind log_enabled!(Debug) so it only appeared with --log-level debug. At info level, the failure surfaced as just "Invalid request: <terse serde error>" with no way to correlate with the raw payload. Add a hex_preview helper in net/common/utils that renders the first 64 bytes of a slice as lowercase hex via hex::encode, appending "..." when truncated. Safe for binary data, no lossy UTF-8. Rewrite both decode-error arms of the server receive loop to: - always fire at error! level (no log_enabled! gate) - include the byte length of the payload - include a hex preview via utils::hex_preview - drop the lossy String::from_utf8_lossy dump The remaining log_enabled!(Debug) gate on the happy-path "New request" log and the log_enabled!(Trace) gate on the heartbeat loop are untouched - they are happy-path diagnostics, not errors.
Mirror the server-side rewrite from the previous commit for the client receive path in map_to_typed_mailbox. The old code gated the raw- payload dump behind log_enabled!(Trace) and used lossy UTF-8 over binary MessagePack; the "always-on" error! line carried only the target type name and the terse serde error. Replace with a single error! call that always fires and includes target type, byte length, a hex preview of the payload via utils::hex_preview, and the real inner deserialize error. After this commit, both ends of the wire produce the same information-rich decode-error format at info log level.
Records the four-commit slice that rescoped Phase E+F into a targeted fix for buried deserialize errors in the wire protocol. Documents the Msg<T> custom Deserialize, enriched deserialize_from_slice, hex_preview helper, and rewritten server/client receive logging.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FileRead/FileWrite) withReadFileOptionsandWriteFileOptions, adding range read/write support needed by mount backendsdistant-mountcrate with shared mount infrastructure: InodeTable (bidirectional inode↔path mapping with ref counting + LRU eviction), TTL caches (attr, dir, read), write-back buffers, and RemoteFs translation layerfuser— default on Unixcloud-filter— native File Explorer placeholdersnfsserve— localhost NFSv3 server fallbackobjc2-file-provider— native Finder integration withNSFileProviderReplicatedExtensiondistant mount <MOUNT_POINT>anddistant unmount <MOUNT_POINT>commandsCloses #145
Test plan
cargo test --all-features -p distant-core -p distant-hostpasses (protocol consolidation)cargo test -p distant-mount --no-default-featurespasses (54 unit tests for inode table, caches, write buffers)cargo clippy -p distant-mount --no-default-features --features nfspassescargo clippy -p distant-mount --no-default-features --features macos-file-providerpassesdistant mount /tmp/teston a Linux/macOS host with FUSE installed--features nfs