Skip to content

Commit eedc16a

Browse files
feat: vendor/scan/apply overhaul — lockfile-driven discovery, verified auto-fetch, mismatch-tolerant apply (#115)
* fix(purl): percent-decode purl components from the API The patches API serves scoped purls percent-encoded (pkg:npm/%40scope/name@1.0.0) and scan stores them verbatim as manifest keys, but neither the npm crawler nor the vendor coordinate parser decoded them — so apply/vendor reported scoped packages as 'package not installed', and detect_prunable saw every encoded entry as prunable. - utils/purl.rs: percent_decode_purl_component (strict, all-or-nothing, fail-safe passthrough), normalize_purl + purl_eq (compare/display only, never path construction) - npm_crawler parse_purl_components, vendor parse_npm_purl (NpmCoords now owns decoded name/version; base_purl stays verbatim for ledger/ manifest key parity), parse_jsr_purl: decode AFTER /-and-@ splits, BEFORE the is_safe_* guards — %2e%2e/%2f cannot smuggle traversal - detect_prunable + purl_matches_identifier compare normalized forms - human output shows the decoded purl; JSON keeps verbatim keys Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): auto-force staging on content mismatch + correct already-applied events The vendor stage is a private copy and every apply write path is hash-gated to exactly afterHash, so a beforeHash mismatch (a patch built against different bytes than the installed artifact, or a package already patched in place by apply) no longer fails the vendor: the stage is overwritten with the verified patched content and the overwrite surfaces as a vendor_content_mismatch_overwritten warning event. Missing patch-target files still fail closed without --force (force's silent NotFound skip would pack an artifact without the fix). - shared force_apply_staged / missing_existing_patch_files / mismatch_overwrite_warnings policy helpers in vendor/mod.rs, used by all npm flavors (via stage_patch_pack) + cargo/composer/gem/pypi/ golang backends; dry runs predict the same outcome - vendor.rs: gate the already_vendored rewrite on entry.is_none() — the first vendor of an in-place-applied package now emits Applied (it packed + rewired this run) instead of a miscounted skip - scan --vendor: pre-prompt baseline check annotates mismatched packages before the confirm prompt (best-effort, read-only) - --force narrowed to missing-file tolerance + variant-probe bypass; CLI_CONTRACT.md documents the new warning code Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): take over exact-version override pins (pnpm + yarn berry) A user-authored override/resolution that pins the package to exactly the version being vendored (Flowise: pnpm.overrides 'tar-fs': '3.1.0') no longer refuses with vendor_override_conflict. The pin's key is kept (its spelling and quoting preserved on both pnpm surfaces — pnpm hard-requires the package.json and lock override maps to agree), its VALUE is rewritten to the file:.socket/vendor/... spec, and the pinned value is recorded as the wiring original so every revert path (--revert, reconcile, remove) restores the user's pin verbatim. - pnpm: classify_pkg_override (Insert / Ours / Takeover) replaces the boolean conflict checks; effective key threads through EditCtx, apply_pkg_override and edit_overrides; revert restores originals in place instead of deleting. Ranges, different versions, parent>child selector chains, and duplicate same-name keys still refuse, now with a hint that exact pins are taken over. - yarn berry: bare-name resolutions pin equal to the version is taken over symmetrically (KIND_RESOLUTION records the original). - npm/yarn-classic/bun wire the lock only (no override surface), so no conflict exists there to take over. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(scan): prune lifecycle for vendored packages scan --prune previously blanket-exempted vendored purls, so nothing ever cleaned unused vendored state: dropped patches kept their artifacts and overrides forever, removed dependencies stayed redirected, and orphan uuid dirs were only swept by vendor --revert. The prune pass now runs a vendored-state GC first (under the apply lock; contention degrades to a skip, never a scan failure): (a) entries whose patch is gone from the manifest are reverted (same stale test as the vendor flows' reconcile_dropped); (b) entries whose dependency left the lockfile graph are reverted and their manifest entries dropped, feeding the same pass's blob sweep. Per-flavor in-use probes: pnpm scans packages:/snapshots: blocks for the artifact (the mirrored overrides: declaration alone is not usage); package-lock/yarn/bun probe the lock text for the uuid dir (those flavors wire resolutions into the lock itself). None = cannot determine = keep, fail-safe; detached entries are exempt (lockfile-invisible by design); (c) orphan .socket/vendor/<eco>/<uuid> dirs are swept (extracted from run_revert into a shared sweep_orphan_vendor_dirs). JSON gc gains revertedVendoredEntries/removedVendorOrphanDirs (wet) and revertableVendoredEntries/vendorOrphanDirs (preview, which also mirrors the wet pass's manifest drops so blob counts agree); human output gains a GC summary line. CLI_CONTRACT.md updated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: e2e coverage for encoded scoped purls, mismatch annotation, prune lifecycle - scan_vendor_e2e: full pipeline with the API's percent-encoded scoped purl form (download -> vendor lookup against node_modules/@scope -> lock rewiring -> prune exemption); interactive pre-prompt baseline annotation + auto-force warning; scan --prune reverting an unused vendored entry (ledger + manifest + artifact + lock all reconciled) - clippy: too_many_arguments allow on stage_patch_pack, JsrPurlParts type alias Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): lockfile inventory module for npm-family locks Read-only inventories of the dependency set a lockfile resolves, independent of what is installed: name/version/purl plus the lock's artifact URL and content verifier (typed LockIntegrity: SRI, yarn sha1 fragment, berry cache-zip checksum, sha256 hex, go.sum h1 — the latter two for the ecosystems that follow). Powers scan's lockfile supplement and vendor's missing-package fetch. Covers all five npm flavors via detect_npm_lock_flavor (package-lock/ shrinkwrap, pnpm v9, yarn classic, yarn berry, bun). Fail-soft per entry, fail-closed per value (names/versions path-guarded; git/file/ link/workspace specs and our own vendored entries excluded; duplicate instances dedup preferring a verifier). lookup() bridges percent- encoded manifest purls. Reuses the wiring backends' parsers via pub(super) visibility bumps. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): registry_fetch — verified pristine-artifact fetching Downloads the artifact a lockfile entry resolves (lock-recorded URL, else the conventional npm registry URL; SOCKET_NPM_REGISTRY override), verifies it against the lock-recorded integrity FAIL-CLOSED before any disk write (strongest hash of a multi-hash SRI; yarn sha1 fragment; sha256 hex), and extracts to a private tempdir the vendor pipeline can stage from. Entries with no verifier are refused without any network I/O (Unverifiable). Hardening: http(s)-only, download/decompression/entry-count/entry-size caps, regular-files-only extraction with first-component strip + is_safe_relative_subpath (fail-closed on traversal-bearing tarballs, nothing half-extracts), exec bits preserved so the deterministic re-pack keeps bin scripts executable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor,scan): auto-fetch missing packages + lockfile/ledger discovery vendor: a manifest patch whose package has no installed copy is now satisfied automatically (no flag) instead of failing with package_not_installed: - already-vendored purls stage from their own committed artifact, sha256-verified against the vendor ledger (fresh-clone re-vendor and in-sync runs work offline, no registry traffic); - otherwise the lockfile-resolved pristine artifact is fetched (lock-recorded URL else conventional registry URL), verified against the lock's integrity FAIL-CLOSED, and staged from a private tempdir — the project tree is never touched. Reason codes: vendor_fetched_missing (skip-warning beside the Applied event), vendor_fetch_failed (distinct Failed, suppresses the duplicate not-installed skip), vendor_fetch_unverifiable (no lock integrity → calm skip). --offline keeps the calm skip and names the lockfile as the would-be source. scan: discovery now supplements the installed-tree crawl with (a) lockfile-only dependencies — warned '[NOT INSTALLED]' in the table + a stderr note, JSON lockfileOnlyPackages + packages[].notInstalled, counted as scanned so a wiped node_modules no longer prunes lockfile-listed entries, partitioned out of --apply BEFORE download (calm skipped/package_not_installed records, exit 0, no manifest writes) while --vendor passes them to the auto-fetch; and (b) vendored-ledger entries — the committed artifact IS the dependency, so updates[] detection and scan --vendor keep working on a fresh clone before any install. scan --json --vendor now vendors a completely fresh clone end-to-end (e2e-proven, second run already_vendored). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): yarn berry checksum-verified fetch + ledger artifact staging tests Berry locks never hash the tarball — the checksum is sha512 of the deterministic cache zip. The fetch rebuilds that zip from the fetched bytes via the same spike-pinned berry_zip recipe the wiring uses and compares the 10c0/<hex> value fail-closed (foreign cacheKeys are Unverifiable). Plus unit coverage for stage_local_artifact's ledger-sha256 gate. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): cargo + golang lockfile inventory and verified fetch - Cargo.lock [[package]] inventory: crates.io-sourced entries carry their sha256 .crate checksum (Sha256Hex); workspace members skipped, git/custom-registry sources discovery-only. Fetch from static.crates.io (SOCKET_CRATES_REGISTRY override), verify, extract ({name}-{version}/ top dir) — feeds vendor_cargo_crate's pristine_src. - go.sum inventory: module-zip h1: lines (the /go.mod manifests-only lines skipped). Fetch from the module proxy (SOCKET_GOPROXY, else the standard GOPROXY's first non-direct element, else proxy.golang.org) with Go's !uppercase path escaping; verify the dirhash Hash1/HashZip in memory BEFORE extraction (algorithm validated against a live sum.golang.org lookup for golang.org/x/text@v0.14.0); extract with the explicit module@version/ prefix (module paths contain slashes, so a first-component strip would be wrong) — feeds vendor_go_module. - lookup() generalized across ecosystems; inventory_project() returns the union the scan supplement and vendor auto-fetch consume. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): composer + gem + pypi lockfile inventory and verified fetch - composer.lock packages[]/packages-dev[]: zip dists with their sha1 shasum (frequently empty → discovery-only); names lowercased to the packagist form, pretty leading v dropped; path dists (ours) excluded. Fetch verifies sha1 and strips the variable zipball top dir. - Gemfile.lock GEM/specs + bundler 2.6 CHECKSUMS sha256 (older locks discovery-only); the GEM remote drives the /downloads/<gem> URL. Platform-suffixed specs skipped (unsupported for vendoring). The fetched .gem (plain tar) is sha256-verified whole, then data.tar.gz extracts at the root (no prefix strip). - pypi: uv.lock registry packages with a pure py3-none-any wheel carry a fetchable URL + sha256; poetry.lock and ==-pinned requirements.txt contribute discovery-only entries (PEP 503-normalized names). The unzipped wheel is a site-packages-shaped stage for the pypi backend. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(scan): all-ecosystem lockfile supplement + docs scan's lockfile supplement now consumes inventory_project (npm-family, Cargo.lock, go.sum, composer.lock, Gemfile.lock, uv/poetry/requirements) with per-ecosystem counts; the vendor auto-fetch pass likewise serves every inventoried ecosystem. CLI_CONTRACT.md gains the lockfile- supplement and vendor-auto-fetch sections + the three reason codes; README notes the fresh-clone flow; the exact-shape empty-scan contract test pins the additive lockfileOnlyPackages field; the cargo build e2e scrubs ambient CARGO_TARGET_DIR from child builds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(apply): beforeHash mismatch warns and applies the full blob by default; --strict restores the hard error A file whose on-disk content matches NEITHER the patch's beforeHash nor its afterHash previously hard-failed the in-place apply (the flatted case: a patch built against non-registry bytes made plain apply unusable). The default now overwrites such files with the FULL verified patched content and continues: - core: apply_package_patch's force bool becomes MismatchPolicy {Warn (default) | Strict | Force}. Warn promotes HashMismatch to Ready keeping the warning signature (expected/current hashes); the diff strategy self-disables on a wrong base (partial patches are skipped, as they must be) and the archive/blob writes stay hash-gated to exactly afterHash — a tolerated mismatch lands verified patched bytes or fails, never silent corruption. Missing pre-existing files still fail closed (only Force skips them). - CLI: global --strict (env SOCKET_STRICT) restores the fail-closed behavior across apply/get/scan --apply/the hook/go redirects (--force overrides it); plumbed through DownloadParams into the nested applies. Vendor staging is unaffected (already auto-forces into its private stage). - Each overwrite logs a content_mismatch_overwritten warning to stderr and rides the JSON envelope as a Skipped warning event beside the package's Applied event. - Since the full content lives in the afterHash blob and the default --download-mode diff may not have staged it, a pre-apply pass probes for mismatches and downloads the missing blobs by hash (offline runs warn and let those files fail). Live-verified: pristine flatted@3.3.1 + its bad-baseline patch now applies 6/6 files via blob with per-file warnings (exit 0); apply --strict exits 1 with the old error and leaves files untouched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * polish(apply): decode percent-encoded purls in human output The 'Patched packages' summary and the no-matching-installed-package warning printed manifest keys verbatim (pkg:npm/%40scope/...); show the decoded form like the scan/vendor output does. JSON keeps verbatim keys. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(vendor): hold patch blobs in memory — vendoring writes no .socket/blobs or temp files Vendor flows (vendor, scan --vendor, --detached) no longer persist patch content anywhere on disk: a vendored project's .socket holds only manifest.json and vendor/. - core: PatchSources.mem_blobs overlay, checked before the on-disk blob read in the apply pipeline's blob strategy. - core: harvest_artifact_blobs — re-stage afterHash blobs from the committed vendor artifact itself (uuid-matched against the ledger, every blob self-verified by its own git-sha256), so in-sync re-runs and fresh clones of vendored projects stage with no network. - cli: stage_vendor_sources_in_memory replaces the disk stager in all vendor flows; missing content is fetched per patch via the proxy-aware patch-view endpoint straight into memory. - cli: DownloadParams.persist_blobs — scan passes !args.vendor so the scan --vendor download phase writes only the manifest. - e2e: .socket-stays-lean assertions (manifest mode, detached, fresh clone) + no-blobs detached idempotency; core harvest unit tests (tgz, dir-shaped, stale-uuid, escaping-path fail-closed). - docs: CLI contract "Patch sources stay in memory" section. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(repair): rebuild missing/corrupt vendored artifacts + no-ledger reconstruction repair now owns the vendored-artifact lifecycle: artifacts referenced by the ledger and/or rewired lockfiles but missing or corrupt on disk are rebuilt fail-closed, and a wholesale-deleted .socket/vendor (state.json included) is reconstructed from the lockfile references alone. Core: - ArtifactHealth + check_vendored_artifact (vendor/verify.rs): per-file afterHashes plus a whole-file sha256 cross-check against the ledger for file-shaped artifacts. - recover_lock_entry (lock_inventory): the pre-vendor registry fragment recovered from the wiring originals (npm/pnpm/yarn/berry/bun fragments, composer dist, gem checksum line, uv wheel, cargo entry.lock); golang rides the unrewired go.sum. - wired_vendor_integrity + fetch_npm_unverified + artifact_matches_integrity: the REWIRED lock's recorded integrity of our packed tarball is the trust anchor — reconstruction can fetch pristine unverified and still land only bytes that reproduce the wired integrity (tamper => removed, fail). - Artifact-only rebuild branches in the composer/cargo/gem/golang/pypi backends: wired-but-broken artifacts rebuild in place with NO lock write and NO ledger re-record (fixes the latent original-clobbering full-path re-run); golang in-sync re-runs now record nothing; uv same-uuid re-runs are an InSync hot path instead of a refusal. - pnpm: fail-closed duplicate-mapping-key guard for half-edited locks in edit_packages/edit_snapshot_rekey. - Memory stager: a diff archive alone is no longer a sufficient vendor source (auto-force can need full after-blobs a diff cannot produce). CLI: - repair_vendor.rs: ledger health pass, lockfile-reference reconstruction (uuid recovered from the contract's path rule; manifest record else the patch view API => detached entry with the record embedded), rebuilds via the normal vendor dispatch + the pristine-source ladder, post-verified against the recorded fingerprint. Offline rebuilds run when fully local. - repair: manifest_not_found softened when vendor traces exist; step 1 skips vendored/lockfile-referenced entries (a vendored project's repair never re-litters .socket/blobs|diffs). - vendor auto-fetch: a MISSING committed artifact falls through to the ledger-recovered registry fetch instead of failing; corrupt stays loud. - Envelope: PatchAction::Rebuilt + summary.rebuilt (omitted while zero). Tests: repair_vendor_e2e (12 scenarios incl. tampered-pristine rejection, offline both ways, detached, no-ledger and no-manifest reconstruction), per-backend wired-missing-copy rebuilds, health matrix, fragment recovery per wiring kind, pnpm colon-key scanner unit, half-drifted lock guard. Live-verified on Flowise: 19/19 fresh vendor with a lean .socket, deleted artifact rebuilt byte-identically, and a 14/14 full reconstruction from nothing but the rewired pnpm lockfile. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(pnpm): bind vendor edits to name@VERSION — multi-version vendoring corrupted the lock Every "ours" probe in the pnpm backend matched ANY same-name .socket/vendor path, so a project vendoring the same package at several versions (Flowise: three fast-xml-parser, five minimatch patches) had each version's edit treat its siblings' entries as its own stale wiring: override values were clobbered to the wrong tarball and packages/ snapshots rekeys spliced duplicated mapping keys — which pnpm hard-rejects (ERR_PNPM_BROKEN_LOCKFILE), discovered live when repair's reconstruction re-dispatched all versions in sequence. - EditCtx::is_ours / both is_ours_key block probes / the override classification + lock-side mirror check now require the vendor path's leaf to be THIS name-version.tgz (any uuid — stale-uuid refresh unchanged); sibling-version vendored entries are skipped as coexisting. - edit_packages/edit_snapshot_rekey fail closed when BOTH the registry-keyed and our file:-keyed entry exist (a half-edited lock): refusing beats splicing a duplicate key. - Regression tests: multi-version vendor coexistence (per-section duplicate-key audit), integrity-drift refresh stays single-keyed, half-drifted duplicate guard. Live-verified on Flowise end to end: scan --vendor (16/16) → pnpm install --frozen-lockfile → rm -rf .socket → repair (16/16 reconstructed from the lockfile alone) → frozen install again, exit 0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(docker): pin scan --sync to --strict where apply --force must stay the writer The cargo/composer/golang/maven/nuget docker chains use an all-zeros beforeHash fixture and assert that the dedicated `apply --force` step is the one that patches (exactly one applied, skipped:0, marker written by apply). The new mismatch-warn default makes `scan --sync` overwrite the mismatched file with the verified blob during the scan itself, so the later apply reported already_patched and every gate failed. `--strict` restores the hard-error scan these scripts encode; the warn-overwrite default keeps its coverage in the wiremock apply/scan suites and the deno/gem/pypi docker chains (real beforeHashes). The remaining red CI (3-OS test, test-release, coverage, deno/pypi docker baselines) was the live patches API returning 503 "Service temporarily over capacity" during the run — transient, recovered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(docker): pin the content_mismatch_overwritten warning in the force-apply gates With scan pinned to --strict, the dedicated `apply --force` step is the writer again — but force-overwriting the all-zeros-baseline fixture now also surfaces the content_mismatch_overwritten warning as a Skipped event, so the old `skipped:0` gates fail. Assert the new contract instead: exactly one skip AND the warning's errorCode present (cargo/golang/maven/nuget×2; composer has no skipped gate). All five suites verified locally against freshly built images. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent ac1fdd6 commit eedc16a

52 files changed

Lines changed: 11988 additions & 710 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,10 @@ socket-patch scan -g
233233
# Scan + apply + emit an OpenVEX attestation in one pass
234234
socket-patch scan --json --sync --yes --vex socket.vex.json
235235

236-
# Vendor every patched dependency (committable; see the vendor command)
236+
# Vendor every patched dependency (committable; see the vendor command).
237+
# Works on a completely fresh clone: dependencies listed in the lockfile
238+
# but not yet installed are fetched pristine from their registry and
239+
# integrity-verified against the lockfile before vendoring.
237240
socket-patch scan --json --vendor --yes
238241

239242
# Same, but keep the manifest out of it entirely

crates/socket-patch-cli/CLI_CONTRACT.md

Lines changed: 64 additions & 7 deletions
Large diffs are not rendered by default.

crates/socket-patch-cli/Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ setup-e2e = []
5959

6060
[dev-dependencies]
6161
sha2 = { workspace = true }
62+
# scan_vendor_e2e builds pristine registry tarballs for the auto-fetch tests.
63+
tar = { workspace = true }
64+
flate2 = { workspace = true }
6265
hex = { workspace = true }
6366
wiremock = { workspace = true }
6467
portable-pty = { workspace = true }

crates/socket-patch-cli/src/args.rs

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,19 @@ pub struct GlobalArgs {
144144
)]
145145
pub offline: bool,
146146

147+
/// Treat a beforeHash mismatch as a hard error. By DEFAULT a file whose
148+
/// on-disk content matches neither the patch's beforeHash nor its
149+
/// afterHash is overwritten with the full verified patched content and
150+
/// surfaced as a stderr warning (`content_mismatch_overwritten`); this
151+
/// flag restores the fail-closed behavior. `--force` overrides it.
152+
#[arg(
153+
long,
154+
env = "SOCKET_STRICT",
155+
default_value_t = false,
156+
value_parser = parse_bool_flag,
157+
)]
158+
pub strict: bool,
159+
147160
/// Operate on globally-installed packages.
148161
#[arg(
149162
long = "global",
@@ -378,6 +391,7 @@ impl Default for GlobalArgs {
378391
ecosystems: None,
379392
download_mode: "diff".to_string(),
380393
offline: false,
394+
strict: false,
381395
global: false,
382396
global_prefix: None,
383397
json: false,

crates/socket-patch-cli/src/commands/apply.rs

Lines changed: 165 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,111 @@ use socket_patch_core::crawlers::{
66
use socket_patch_core::manifest::operations::read_manifest;
77
use socket_patch_core::manifest::schema::PatchRecord;
88
use socket_patch_core::patch::apply::{
9-
apply_package_patch, verify_file_patch, ApplyResult, PatchSources, VerifyStatus,
9+
apply_package_patch, verify_file_patch, ApplyResult, MismatchPolicy, PatchSources, VerifyStatus,
1010
};
11+
/// Files whose pre-apply content matched NEITHER hash and were (or would
12+
/// be) overwritten with the verified patched content — the promoted
13+
/// verify signature `apply_package_patch` leaves behind under the default
14+
/// mismatch policy.
15+
pub(crate) fn mismatch_overwritten_files(result: &ApplyResult) -> Vec<String> {
16+
result
17+
.files_verified
18+
.iter()
19+
.filter(|v| {
20+
v.status == VerifyStatus::Ready
21+
&& v.expected_hash.is_some()
22+
&& v.current_hash != v.expected_hash
23+
})
24+
.map(|v| v.file.clone())
25+
.collect()
26+
}
27+
28+
/// Surface one mismatch-overwrite per file on stderr (human mode).
29+
fn warn_mismatch_overwrites(result: &ApplyResult, common: &GlobalArgs) {
30+
if common.json || common.silent {
31+
return;
32+
}
33+
for file in mismatch_overwritten_files(result) {
34+
eprintln!(
35+
"Warning (content_mismatch_overwritten): {} {file} did not match the patch's \
36+
expected original content; applied the full verified patched content instead \
37+
(pass --strict to fail on mismatches)",
38+
socket_patch_core::utils::purl::normalize_purl(&result.package_key)
39+
);
40+
}
41+
}
42+
43+
/// The default mismatch policy applies the FULL patched content for
44+
/// mismatched files — and the full content lives in the afterHash blob,
45+
/// which the default `--download-mode diff` may not have staged. Probe the
46+
/// in-scope packages for mismatches and fetch the missing afterHash blobs
47+
/// by hash (online only) so the apply below can fall through diff → blob.
48+
async fn ensure_blobs_for_mismatches(
49+
args: &ApplyArgs,
50+
manifest: &socket_patch_core::manifest::schema::PatchManifest,
51+
all_packages: &HashMap<String, std::path::PathBuf>,
52+
blobs_path: &Path,
53+
) {
54+
if args.common.strict && !args.force {
55+
return; // strict fails on mismatch — nothing to fetch
56+
}
57+
let mut needed: std::collections::HashSet<String> = std::collections::HashSet::new();
58+
for (purl, pkg_path) in all_packages {
59+
let Some(record) = manifest.patches.get(purl) else {
60+
continue;
61+
};
62+
for (file_name, info) in &record.files {
63+
if info.before_hash.is_empty() {
64+
continue;
65+
}
66+
let verify = verify_file_patch(pkg_path, file_name, info).await;
67+
if verify.status == socket_patch_core::patch::apply::VerifyStatus::HashMismatch
68+
&& tokio::fs::metadata(blobs_path.join(&info.after_hash))
69+
.await
70+
.is_err()
71+
{
72+
needed.insert(info.after_hash.clone());
73+
}
74+
}
75+
}
76+
if needed.is_empty() {
77+
return;
78+
}
79+
if args.common.offline {
80+
if !args.common.silent && !args.common.json {
81+
eprintln!(
82+
"Warning: {} mismatched file(s) need their full patched blob, but --offline \
83+
prevents fetching; those files will fail to apply",
84+
needed.len()
85+
);
86+
}
87+
return;
88+
}
89+
if !args.common.silent && !args.common.json {
90+
eprintln!(
91+
"Downloading {} full patched blob(s) for mismatched file(s)...",
92+
needed.len()
93+
);
94+
}
95+
let (client, _) = get_api_client_with_overrides(args.common.api_client_overrides()).await;
96+
let _ = socket_patch_core::api::blob_fetcher::fetch_blobs_by_hash(
97+
&needed, blobs_path, &client, None,
98+
)
99+
.await;
100+
}
101+
102+
/// The mismatch policy this run applies with: `--force` ⊃ default
103+
/// (adds the missing-file skip), `--strict` restores fail-closed.
104+
pub(crate) fn mismatch_policy(force: bool, strict: bool) -> MismatchPolicy {
105+
if force {
106+
MismatchPolicy::Force
107+
} else if strict {
108+
MismatchPolicy::Strict
109+
} else {
110+
MismatchPolicy::Warn
111+
}
112+
}
113+
11114
#[cfg(feature = "golang")]
12115
use socket_patch_core::patch::go_redirect::{
13116
apply_go_redirect, reconcile_go_redirects, verify_go_redirect_state,
@@ -102,7 +205,7 @@ async fn try_local_go_apply(
102205
patch: &PatchRecord,
103206
sources: &PatchSources<'_>,
104207
common: &GlobalArgs,
105-
force: bool,
208+
policy: MismatchPolicy,
106209
) -> Option<ApplyResult> {
107210
if !is_local_go(purl, common) {
108211
return None;
@@ -126,7 +229,7 @@ async fn try_local_go_apply(
126229
sources,
127230
Some(&patch.uuid),
128231
common.dry_run,
129-
force,
232+
policy,
130233
)
131234
.await,
132235
)
@@ -139,7 +242,7 @@ async fn try_local_go_apply(
139242
_patch: &PatchRecord,
140243
_sources: &PatchSources<'_>,
141244
_common: &GlobalArgs,
142-
_force: bool,
245+
_policy: MismatchPolicy,
143246
) -> Option<ApplyResult> {
144247
None
145248
}
@@ -538,6 +641,21 @@ pub async fn run(args: ApplyArgs) -> i32 {
538641
}
539642
for result in &results {
540643
env.record(result_to_event(result, args.common.dry_run));
644+
// Mismatch overwrites ride as Skipped warning events
645+
// (same pattern as the vendor warnings): the package's
646+
// Applied event stands, the warning is per-file.
647+
for file in mismatch_overwritten_files(result) {
648+
env.record(
649+
PatchEvent::new(PatchAction::Skipped, result.package_key.clone())
650+
.with_reason(
651+
"content_mismatch_overwritten",
652+
format!(
653+
"{file} did not match the patch's expected original \
654+
content; the full verified patched content was applied"
655+
),
656+
),
657+
);
658+
}
541659
// Sidecar records live on the envelope, not on
542660
// individual events. Consumers iterate
543661
// `envelope.sidecars[]` and JOIN against
@@ -609,9 +727,16 @@ pub async fn run(args: ApplyArgs) -> i32 {
609727
} else {
610728
format!(" (via {})", tags.join("+"))
611729
};
612-
println!(" {}{}", result.package_key, suffix);
730+
println!(
731+
" {}{}",
732+
socket_patch_core::utils::purl::normalize_purl(&result.package_key),
733+
suffix
734+
);
613735
} else if all_files_already_patched(result) {
614-
println!(" {} (already patched)", result.package_key);
736+
println!(
737+
" {} (already patched)",
738+
socket_patch_core::utils::purl::normalize_purl(&result.package_key)
739+
);
615740
}
616741
}
617742
}
@@ -888,6 +1013,7 @@ async fn apply_patches_inner(
8881013
}
8891014

8901015
// Apply patches
1016+
ensure_blobs_for_mismatches(args, &manifest, &all_packages, &blobs_path).await;
8911017
let mut has_errors = false;
8921018

8931019
// Group release-variant PURLs by base. PyPI (`?artifact_id=`),
@@ -969,6 +1095,7 @@ async fn apply_patches_inner(
9691095
blobs_path: &blobs_path,
9701096
packages_path: Some(&packages_path),
9711097
diffs_path: Some(&diffs_path),
1098+
mem_blobs: None,
9721099
};
9731100
let result = apply_package_patch(
9741101
variant_purl,
@@ -977,10 +1104,11 @@ async fn apply_patches_inner(
9771104
&sources,
9781105
Some(&patch.uuid),
9791106
args.common.dry_run,
980-
args.force,
1107+
mismatch_policy(args.force, args.common.strict),
9811108
)
9821109
.await;
9831110

1111+
warn_mismatch_overwrites(&result, &args.common);
9841112
// A variant that reached apply is the installed distribution
9851113
// (it passed the first-file check, or `--force` bypassed it),
9861114
// so record it as matched whether or not the patch succeeded.
@@ -1052,32 +1180,40 @@ async fn apply_patches_inner(
10521180
blobs_path: &blobs_path,
10531181
packages_path: Some(&packages_path),
10541182
diffs_path: Some(&diffs_path),
1183+
mem_blobs: None,
10551184
};
10561185
// Local go redirects to a project-local patched copy under
10571186
// `.socket/go-patches/` wired via a `go.mod` `replace` (the module
10581187
// cache is `go.sum`-verified, so in-place patching can't build).
10591188
// Everything else — npm/pypi/gem and cargo (vendored or registry
10601189
// cache) — patches in place via `apply_package_patch`. Without the
10611190
// `golang` feature `try_local_go_apply` is an inert `None`.
1062-
let result =
1063-
match try_local_go_apply(purl, pkg_path, patch, &sources, &args.common, args.force)
1191+
let result = match try_local_go_apply(
1192+
purl,
1193+
pkg_path,
1194+
patch,
1195+
&sources,
1196+
&args.common,
1197+
mismatch_policy(args.force, args.common.strict),
1198+
)
1199+
.await
1200+
{
1201+
Some(r) => r,
1202+
None => {
1203+
apply_package_patch(
1204+
purl,
1205+
pkg_path,
1206+
&patch.files,
1207+
&sources,
1208+
Some(&patch.uuid),
1209+
args.common.dry_run,
1210+
mismatch_policy(args.force, args.common.strict),
1211+
)
10641212
.await
1065-
{
1066-
Some(r) => r,
1067-
None => {
1068-
apply_package_patch(
1069-
purl,
1070-
pkg_path,
1071-
&patch.files,
1072-
&sources,
1073-
Some(&patch.uuid),
1074-
args.common.dry_run,
1075-
args.force,
1076-
)
1077-
.await
1078-
}
1079-
};
1213+
}
1214+
};
10801215

1216+
warn_mismatch_overwrites(&result, &args.common);
10811217
if !result.success {
10821218
has_errors = true;
10831219
if !args.common.silent && !args.common.json {
@@ -1111,7 +1247,10 @@ async fn apply_patches_inner(
11111247
unmatched.len()
11121248
);
11131249
for purl in &unmatched {
1114-
eprintln!(" - {}", purl);
1250+
eprintln!(
1251+
" - {}",
1252+
socket_patch_core::utils::purl::normalize_purl(purl)
1253+
);
11151254
}
11161255
}
11171256

@@ -1289,7 +1428,7 @@ mod tests {
12891428
.enumerate()
12901429
.map(|(i, status)| VerifyResult {
12911430
file: format!("package/f{i}.js"),
1292-
status: status.clone(),
1431+
status: *status,
12931432
message: None,
12941433
current_hash: None,
12951434
expected_hash: None,

0 commit comments

Comments
 (0)