Skip to content

Conversation

@jrray
Copy link
Collaborator

@jrray jrray commented Oct 1, 2025

Blob files contain the digest and size of the payload. The digest is redundant because it is the same as the file name of the blob. The size is also available in the Entry type in Manifests. Reading a blob to learn the digest of the payload requires already knowing the payload digest in order to read the blob.

The invariant of a blob not existing unless the payload also exists has proven to be difficult to maintain. By eliminating blobs, we eliminate this invariant.

@jrray jrray self-assigned this Oct 1, 2025
@jrray jrray added the enhancement New feature or request label Oct 1, 2025
@rydrman
Copy link
Collaborator

rydrman commented Oct 1, 2025

From the meeting today:

  • no concerns about this change
  • flatbuffers, the data structure is still used and nested in entries...
  • possibly a clean or migrate command that allows us to remove these once spfs is updated
  • how do I get metadata about a payload without a read_object on the digest
    • we'll need to handle this kind of thing seamlessly in commands like spfs info, but not necessarily keep supporting this as a universal API behaviour
  • consider what the syncer will do during this migration - where a server has been updated to the blob-file-less version of spfs and the client has not.

@jrray jrray force-pushed the push-volovwxpnxry branch 2 times, most recently from c69b0d5 to bca6bae Compare October 17, 2025 00:11
@jrray jrray marked this pull request as draft October 17, 2025 02:41
@jrray jrray marked this pull request as ready for review October 20, 2025 06:00
@jrray jrray requested a review from Copilot October 20, 2025 06:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Removes separate blob objects and associated invariants, consolidating storage around payload digests directly. Introduces payload-focused APIs (commit_payload, sync_object_digest / sync_payload), adds size and conditional removal operations for payloads, and extends digest search to return both object and payload kinds (FoundDigest). Refactors reporting via a new Summary trait and updates CLI commands and checking/cleaning logic to work without blob objects.

  • Replace blob commit/sync/read APIs with payload equivalents (commit_payload, sync_payload, sync_object_digest)
  • Extend database/search and protocol types to distinguish object vs payload digests (FoundDigest); remove unsafe blob/payload coupling
  • Add payload management (payload_size, remove_payload_if_older_than) and adjust check/clean logic and CLI tools accordingly

Reviewed Changes

Copilot reviewed 62 out of 62 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/spk-storage/src/storage/spfs.rs Switch tag writes from commit_blob to commit_payload (tags now point to payload digests).
crates/spk-storage/src/fixtures.rs Test fixture updated to use commit_payload.
crates/spk-exec/src/exec.rs Rename sync_digest to sync_object_digest.
crates/spk-cli/group3/src/cmd_import.rs Import Summary trait for reporting.
crates/spk-cli/cmd-debug/src/cmd_debug.rs Update sync call to sync_object_digest.
crates/spfs/src/sync_test.rs Adapt tests to payload-centric API (has_payload, commit_payload).
crates/spfs/src/sync/reporter.rs Introduce Summary trait and unify result enums (add Payload variants).
crates/spfs/src/sync.rs Core sync logic updated to distinguish object vs payload digests; remove unsafe payload sync.
crates/spfs/src/storage/tar/repository.rs Implement new PayloadStorage signatures (no unsafe, add payload_size, conditional removal).
crates/spfs/src/storage/rpc/payload.rs RPC payload service gains size and conditional removal calls.
crates/spfs/src/storage/rpc/database.rs Update find_digests to return FoundDigest.
crates/spfs/src/storage/repository.rs Remove BlobStorage, add commit_payload helper.
crates/spfs/src/storage/proxy/* Proxy repository updated for payload_size and conditional removal.
crates/spfs/src/storage/payload.rs Remove unsafe write_data, add payload_size and conditional removal API.
crates/spfs/src/storage/platform.rs / layer.rs / manifest.rs Adjust iterators for DatabaseItem (object vs payload).
crates/spfs/src/storage/fs/* Filesystem backends updated for payload operations and digest search returning FoundDigest.
crates/spfs/src/server/* Server RPC services extended for payload size and conditional removal; digest streaming updated.
crates/spfs/src/graph/database.rs Introduce FoundDigest & DatabaseItem abstractions; adjust iteration and search API.
crates/spfs/src/check.rs & tests Refactor check logic to payload-only model; remove blob-specific results.
crates/spfs/src/clean.rs Cleaner updated to process DatabaseItem (objects & payloads).
crates/spfs/src/runtime/storage.rs Internal runtime now commits payloads (remove blob object creation).
crates/spfs/src/proto/* Protocol definitions extended (FoundDigest, payload size, conditional removal).
crates/spfs-cli/main/src/* CLI commands adapted for payload digests (read/info/check/push/pull).
Cargo.toml files Add test-fixtures feature; include criterion & rstest changes.
Removed blob.rs Remove obsolete BlobStorage abstractions.
Comments suppressed due to low confidence (2)

crates/spfs/src/check.rs:941

  • valid_payload_bytes is never incremented for Ok or Repaired payloads, so payload size totals remain zero. Use repo.payload_size(digest) when constructing summaries (pass size into summary) or account for size at accumulation time.
    pub fn summary(&self) -> CheckSummary {
        match self {
            Self::Missing(digest) => CheckSummary {
                missing_payloads: Some(*digest).into_iter().collect(),
                ..Default::default()
            },
            Self::Repaired => CheckSummary {
                valid_payloads: 1,
                repaired_payloads: 1,
                ..Default::default()
            },
            Self::Ok => CheckSummary {
                valid_payloads: 1,
                ..Default::default()
            },
        }
    }

crates/spfs/src/check.rs:1

  • ConsoleCheckReporter treats all item results as 'objects' but now CheckItemResult can represent payloads (Payload variant); differentiate counters for object vs payload to maintain accurate reporting semantics.
// Copyright (c) Contributors to the SPK project.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +567 to +568
// XXX: this ignores must_be_older_than
return ready(Ok(ready(Ok((item, true))).boxed()));
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dry_run mode removed is always set to true, ignoring must_be_older_than filtering; this misreports items as removable. Compute whether the item would be removed (age check) and pass that boolean instead of hardcoding true.

Suggested change
// XXX: this ignores must_be_older_than
return ready(Ok(ready(Ok((item, true))).boxed()));
// In dry_run, compute whether the item would be removed (age check)
let must_be_older_than = self.must_be_older_than;
let repo = &self.repo;
let fut = async move {
let is_old_enough = match &item {
graph::DatabaseItem::Object(digest, _flat_object) => {
repo.object_is_older_than(must_be_older_than, *digest).await.unwrap_or(false)
}
graph::DatabaseItem::Payload(digest) => {
repo.payload_is_older_than(must_be_older_than, *digest).await.unwrap_or(false)
}
};
Ok((item, is_old_enough))
};
return ready(Ok(fut.boxed()));

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preexisting condition that I don't intend to address in this PR.

Comment on lines +106 to 109
// TODO: This operation is unsafe, a payload must not be removed if _any_
// reference to it still exists, which could be in Manifests, Layer
// annotations, tags (can you tag a blob directly?), etc.
async fn remove_payload(&self, digest: encoding::Digest) -> Result<()> {
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_payload executes without referential integrity checks; deleting a referenced payload can corrupt manifests or layers. Implement a reference scan (e.g. manifests/layers referencing the digest) before deletion or maintain a reverse index to ensure safe removal.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preexisting condition that I don't intend to address in this PR.

@jrray jrray force-pushed the push-volovwxpnxry branch 3 times, most recently from b165c1a to b507fd1 Compare October 20, 2025 20:45
@jrray jrray changed the base branch from main to win-compile-errors October 20, 2025 20:46
@jrray jrray added the pr-chain This PR doesn't target the main branch, don't merge! label Oct 20, 2025
@jrray jrray force-pushed the push-volovwxpnxry branch from b507fd1 to 7356e01 Compare October 20, 2025 20:49
@jrray jrray force-pushed the win-compile-errors branch from 5b3352e to c766a07 Compare October 20, 2025 20:49
@jrray jrray force-pushed the push-volovwxpnxry branch from 7356e01 to 505d504 Compare October 20, 2025 23:53
@jrray jrray requested a review from Copilot October 20, 2025 23:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 66 out of 66 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 311 to 314
debug_assert!(
false,
"check_object should not be called on blobs, check the payload if appropriate"
);
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug_assert!(false) pattern is unconventional. Consider using unreachable!() or panic!() with a descriptive message, or if this is truly reachable, remove the debug_assert and rely solely on the error return.

Suggested change
debug_assert!(
false,
"check_object should not be called on blobs, check the payload if appropriate"
);
unreachable!("check_object should not be called on blobs, check the payload if appropriate");
// If unreachable!() is disabled (e.g., in some test harness), still return an error.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug assertion here makes it easier to find the caller via a traceback. In a production build an error is returned.

Comment on lines +106 to +108
// TODO: This operation is unsafe, a payload must not be removed if _any_
// reference to it still exists, which could be in Manifests, Layer
// annotations, tags (can you tag a blob directly?), etc.
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment mentions important safety considerations for payload removal. Given that the PR removes blob objects and their associated invariants, this safety concern should be addressed or documented more prominently, as payloads can now be removed without the protection that blob objects previously provided.

Suggested change
// TODO: This operation is unsafe, a payload must not be removed if _any_
// reference to it still exists, which could be in Manifests, Layer
// annotations, tags (can you tag a blob directly?), etc.
/// # Safety Warning
///
/// This operation is unsafe: a payload must not be removed if _any_ reference to it still exists,
/// which could be in manifests, layer annotations, tags, or other metadata.
/// Removing a payload that is still referenced can lead to repository corruption or data loss.
/// It is the caller's responsibility to ensure that the payload is no longer referenced anywhere
/// before invoking this method.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preexisting condition that I don't intend to address in this PR.

@jrray jrray force-pushed the push-volovwxpnxry branch 2 times, most recently from d1bf725 to 8df9867 Compare October 21, 2025 00:40
@jrray jrray requested a review from rydrman October 21, 2025 00:54
Base automatically changed from win-compile-errors to main October 22, 2025 17:41
@jrray jrray force-pushed the push-volovwxpnxry branch from 8df9867 to 2640233 Compare October 24, 2025 23:12
@jrray jrray changed the base branch from main to legacy-format-test-failures October 24, 2025 23:13
Copy link
Collaborator

@rydrman rydrman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple minor things from me, but no need to review again

jrray added 2 commits October 25, 2025 11:11
This should continue to work after removing blob objects.

Add a "test-fixtures" feature to spfs in order to use the fixtures in
external crates.

Signed-off-by: J Robert Ray <[email protected]>
This is also expected to still succeed after removing blob objects.

Signed-off-by: J Robert Ray <[email protected]>
jrray added 5 commits October 25, 2025 11:11
Blob files contain the digest and size of the payload. The digest is
redundant because it is the same as the file name of the blob. The size
is also available in the Entry type in Manifests. Reading a blob to
learn the digest of the payload requires already knowing the payload
digest in order to read the blob.

The invariant of a blob not existing unless the payload also exists has
proven to be difficult to maintain. By eliminating blobs, we eliminate
this invariant.

Signed-off-by: J Robert Ray <[email protected]>
Since we don't write a blob object anymore is may be confusing to
continue calling this "commit_blob".

Signed-off-by: J Robert Ray <[email protected]>
With blob files gone there is no longer an invariant to maintain when
writing payloads.

Signed-off-by: J Robert Ray <[email protected]>
Without a blob object to read, if something wants to know the size of a
payload it is possible with this to do it without having to read the
payload data.

Signed-off-by: J Robert Ray <[email protected]>
The blob existence invariant is gone and these operations are now safe.

Signed-off-by: J Robert Ray <[email protected]>
@jrray jrray force-pushed the push-volovwxpnxry branch from 2640233 to 9ef4438 Compare October 25, 2025 18:19
@jrray jrray force-pushed the legacy-format-test-failures branch from aefab7b to 5a0ef59 Compare October 25, 2025 18:19
jrray added 2 commits October 25, 2025 11:29
The previous commit that removed `write_blob` didn't actually accomplish
much because blobs were mostly being written with `write_object`. There
is now a debug assertion in `write_object` to prevent writing blobs.
Once this was in place, the tests starting revealing all the places that
were still writing blobs, and then all the places that didn't work
anymore when blobs don't exist.

This commit redefines some terminology used in the API. Previously an
"object" could include blob objects and their digests could be used
interchangeably with payload digests. Now payloads are not considered to
be an "object" and methods that operate on object digests do not work
for payloads. Some method names have been changed to make this more
apparent.

The term "item" is used for things (digests) that can be either an
object of a payload.

A "ref" can be a digest and it is expected that it is still possible to
interact with payloads via a ref. A tag can refer to a payload as well.

The 'find_digests' operation used to return digests for blob objects so
absent blobs it has been changed to return both object digests and
payload digests.

Signed-off-by: J Robert Ray <[email protected]>
@jrray jrray force-pushed the push-volovwxpnxry branch from 9ef4438 to 1c49e90 Compare October 25, 2025 18:31
@jrray
Copy link
Collaborator Author

jrray commented Oct 25, 2025

This is in a PR chain with #1284.

Base automatically changed from legacy-format-test-failures to main October 27, 2025 01:09
@jrray jrray removed the pr-chain This PR doesn't target the main branch, don't merge! label Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants