Skip to content

persist: expose API for writing/reading "free-standing" batches #32513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aljoscha
Copy link
Contributor

Add API for writing and reading batches without creating a WriteHandle/ReadHandle. Writing batches still requires a ShardId, which is used for namespacing in blob storage.

Work towards
https://github.com/MaterializeInc/database-issues/issues/9180, where we want to use persist batches/blob to stash peek results when sending them back to environmentd.

@aljoscha aljoscha requested a review from a team as a code owner May 16, 2025 11:47
@@ -121,6 +133,9 @@ pub const BUILD_INFO: BuildInfo = build_info!();
// Re-export for convenience.
pub use mz_persist_types::{PersistLocation, ShardId};

pub use crate::internal::encoding::Schemas;
pub use crate::internal::state::HollowBatch;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might object to these being exposed. We only need HollowBatch because we use it to sniff out the size/num_rows from stashed batches without turning them into a Batch and/or reading them.

/// enough that we can reasonably chunk them up: O(KB) is definitely fine,
/// O(MB) come talk to us.
#[instrument(level = "debug", fields(shard = %shard_id))]
pub async fn batch_builder<K, V, T, D>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are basically duplicated code. I could add a WriteHandle::batch_builder_inner that doesn't take a self but all the arguments it needs. And then both WriteHandle and the code here could call that one.

Same for read_batches_consolidated below.

I didn't do this for now because I felt that the method signature was almost the largest part of these methods. But happy to change that!

@aljoscha aljoscha requested review from bkirwi and ParkMyCar May 16, 2025 11:52
@aljoscha aljoscha force-pushed the persist-free-standing-batch-api branch 2 times, most recently from ed49b52 to 207a19e Compare May 16, 2025 13:01
@aljoscha aljoscha force-pushed the persist-free-standing-batch-api branch from 207a19e to c4221d1 Compare May 16, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant