perf: unify cache/result path into hashed ResolverPath#231
Conversation
Tiny string-slicing parent walker and Path::join-compatible joiner. Used by the upcoming ResolverPath cache that operates on `&str` instead of `&Path` to skip OsStr↔str conversions on the hot path.
Surfaced from Cache::value when the input path is not valid UTF-8. The new ResolverPath stores paths as Box<str>, and the cache validates at the boundary instead of carrying a slow fallback for non-UTF8 paths.
ResolverPath is the unified path value used by both the internal cache and the public Resolution result. It carries: - a precomputed FxHasher u64 over the path bytes - the path as Box<str> (UTF-8) - the parent ResolverPath (cheap Arc clone) - the existing OnceCell-cached fs metadata / package.json fields Parent walks now slice &str via parent_str; module joins use join_str. Lookups skip OsStr↔str conversions on the hot path, and downstream consumers (rspack) can read the prehash directly off the returned ResolverPath instead of re-hashing the PathBuf. Cache::value gains a Result return: non-UTF8 input now surfaces ResolveError::PathNotUtf8 instead of panicking inside path_to_str.
Resolution.path is now a ResolverPath instead of PathBuf. The existing accessors (path / into_path_buf / full_path / query / fragment / package_json) keep their signatures. New accessors give the caller the underlying ResolverPath: pub fn resolver_path(&self) -> &ResolverPath pub fn into_resolver_path(self) -> ResolverPath Downstream consumers can read .hash() directly off the result and reuse it as the FxHashMap key hash, avoiding the second walk over the path bytes that today every resolver result pays.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de4faa7e16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let trimmed = trim_one_trailing_sep(s); | ||
| let idx = last_sep_index(trimmed)?; |
There was a problem hiding this comment.
Preserve relative parent hop in parent_str
parent_str now short-circuits to None for single-component relative paths because there is no separator, but the previous Path::parent() behavior returned Some(""). That removes one ancestor step from the cache parent chain, so load_node_modules's successors(..., p.parent()) walk stops at "a" instead of continuing to ""; with a relative context (e.g. resolve("a", "pkg")), ./node_modules/pkg is no longer considered.
Useful? React with 👍 / 👎.
| if let Some(specifier) = specifier.strip_prefix(SLASH_START) { | ||
| for root in &self.options.roots { | ||
| let cached_path = self.cache.value(root); | ||
| let cached_path = self.cache.value(root).ok()?; |
There was a problem hiding this comment.
Continue scanning roots after UTF-8 conversion failures
load_roots currently uses self.cache.value(root).ok()?, which converts PathNotUtf8 into None and exits the whole function on the first invalid root. That means one non-UTF-8 options.roots entry prevents all later valid roots from being checked, producing an unexpected miss (and hiding the actual path encoding error) instead of continuing the loop.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR unifies the resolver’s internal cached path representation and the public Resolution path into a single ResolverPath that carries a precomputed FxHasher hash plus a UTF-8 string path, aiming to eliminate downstream re-hashing and reduce hot-path overhead (notably on Windows).
Changes:
- Introduces
ResolverPath(hashed, UTF-8,Arc-backed) and uses it across cache and resolution results. - Makes
Cache::valuefallible (Result<ResolverPath, ResolveError>) and addsResolveError::PathNotUtf8instead of panicking on non-UTF-8 paths. - Adds
Resolution::{resolver_path, into_resolver_path}while keeping existingResolutionpath APIs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/cache.rs | Replaces cached path type with ResolverPath, adds UTF-8 validation + precomputed hashing, updates cache keying and adds related tests. |
| src/lib.rs | Threads ResolverPath through resolver flow and updates cache lookups to handle fallible Cache::value. |
| src/resolution.rs | Stores ResolverPath inside Resolution, adds accessors for downstream hash reuse, and updates tests. |
| src/path.rs | Adds parent_str/join_str helpers to avoid repeated OsStr/Path conversions on hot paths and adds unit tests. |
| src/error.rs | Adds ResolveError::PathNotUtf8 and a display test for the new error variant. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Strip a single trailing path separator (platform-aware). Used so that | ||
| /// `parent_str("/a/b/")` walks to `"/a"` instead of `"/a/b"`. | ||
| #[inline] | ||
| fn trim_one_trailing_sep(s: &str) -> &str { | ||
| #[cfg(windows)] |
| if let Some(specifier) = specifier.strip_prefix(SLASH_START) { | ||
| for root in &self.options.roots { | ||
| let cached_path = self.cache.value(root); | ||
| let cached_path = self.cache.value(root).ok()?; |
| /// Returns the path without query and fragment. | ||
| pub fn into_path_buf(self) -> PathBuf { | ||
| self.path | ||
| self.path.path().to_path_buf() | ||
| } |
resolve_impl previously did: let path_buf = self.load_realpath(&cached_path).await?; // PathBuf let path = self.cache.value(&path_buf)?; // extra cache trip That second cache.value runs for every resolve, even when realpath() returned the input bytes unchanged (the common case: no symlink in chain). It costs one FxHasher walk + DashSet lookup + a fresh Arc<ResolverPathInner> insert per result file. With `clear_cache()` called once per bench iter and 1000 deps resolved per iter, that adds up to the 4.5% / 3.5% Simulation regressions and the 22% memory bump CodSpeed flagged on this PR. Make load_realpath return ResolverPath and short-circuit: - options.symlinks = false → reuse cached_path (pure Arc bump) - symlinks on, no symlink → compare path bytes, reuse cached_path - symlinks on, real symlink → cache.value(realpath) (unchanged path) Public API unchanged.
The first fix (6fabea5) only helped when realpath() returned the input bytes unchanged. With pnpm projects (and the resolver bench), every resolved file goes through a .pnpm/<pkg>@<ver>/... symlink, so realpath always changes the path and we always fell through to cache.value(&path_buf), which: - inserts the realpath result into the cache - builds and inserts its entire parent chain into the cache - per-iter, that's ~1000 fresh entries with full .pnpm/ parent chains → matches CodSpeed's +22% memory bump ResolverPath::shallow() builds a single Arc<ResolverPathInner> without inserting into the cache and without constructing the parent chain. Caching the realpath result was never load-bearing for correctness — each ResolverPath already memoizes its own realpath via the canonicalized OnceCell — so dropping the cache insert is purely a memory and instruction-count win. Public API is unchanged. Resolution::resolver_path().parent() returns None when symlinks resolved to a different path (same as if the caller had called .parent() on the old PathBuf result, which had no parent caching either).
Why
Resolver consumers (rspack) re-hash the returned
PathBuffor their own module-id maps. Today every resolver result pays a path hash twice — once inside the cache, once downstream. The internal cache also threads paths throughOsStr-backedPath::hash, which on Windows walks per component.This PR unifies the internal
CachedPathand the publicResolution.path: PathBufinto a singleResolverPath:FxHasheru64 over the path bytesBox<str>(UTF-8)Arcclone, same single-allocation footprint as the oldArc<CachedPathImpl>Resolutionviaresolver_path()/into_resolver_path()Downstream can plug the prehash straight into an
FxHashMapwithout re-walking the bytes.What
ResolverPathreplacesCachedPatheverywhere in the resolver and is the type carried byResolution.Resolution::resolver_path/into_resolver_path. Existingpath()/into_path_buf()/full_path()/query()/fragment()/package_json()keep their signatures — no public input API change.Cache::valuenow returnsResult<ResolverPath, ResolveError>. New error variantResolveError::PathNotUtf8(PathBuf)is raised at the boundary instead of the previouspath_to_strpanic.join_str/parent_stron&str, skippingOsStr ↔ strconversions and the per-component walk inPath::hash(the win is most visible on Windows; Unix is at parity with perf(cache): hash CachedPath by raw bytes on unix #226).Before / after
Tests
ResolverPath::hashmatchesFxHasher::write(bytes), equal lookups shareArc, parent walk matches direct lookup, non-UTF-8 input returnsPathNotUtf8.cargo check -p rspack_napi_resolver) clean.