compute: factor out PeekResultIterator #32514

aljoscha · 2025-05-16T11:53:27Z

The original motivation for this is so that the code that extracts peek results can be re-used in
https://github.com/MaterializeInc/database-issues/issues/9180, where we want to use a different transport for sending back peek responses but still need to read them out of arrangements the same way.

The nice side effect is that we separate extracting the result from the logic that accumulates it in a response for sending it back. Which leads to clearer separation.

Work towards https://github.com/MaterializeInc/database-issues/issues/9180

aljoscha · 2025-05-16T11:54:06Z

Maybe @antiguru and/or @teskje would be good to review this? 🙏

teskje · 2025-05-28T13:22:38Z

src/compute/src/compute_state.rs

+            PendingPeek::Index(peek) => 'response: {
+                let is_ready = peek.is_ready(upper);
+
+                match is_ready {
+                    Ok(false) => break 'response None,
+                    Err(err) => break 'response Some(err),
+                    Ok(true) => (), // Falling through...,
+                }
+
+                if let Some(err) = peek.extract_errs(upper) {
+                    break 'response Some(err);
+                }
+
+                Some(peek.read_result(upper, self.compute_state.max_result_size))


I like early returns usually, but in this case the nested-if version ends up more readable, imo:

PendingPeek::Index(peek) => match peek.is_ready(upper) { Ok(true) => { let resp = peek.extract_errs(upper).unwrap_or_else(|| { peek.read_result(upper, self.compute_state.max_result_size) }); Some(resp) } Ok(false) => None, Err(err) => Some(err), },

sorry, this was from an outdated branch, I updated and backed out all of the extra refactorings, it's not only about the peek iterator

src/compute/src/compute_state/peek_result_iterator.rs

teskje · 2025-05-28T13:33:23Z

src/compute/src/compute_state/peek_result_iterator.rs

+        peek_timestamp: mz_repr::Timestamp,
+        has_literal_constraints: bool,
+        literals: L,
+        oks_handle: &mut Tr,


Nit: It's a bit confusing to have an oks_handle if we don't also have an errs_handle. We could call it trace/trace_reader/trace_handle instead?

will change to trace_reader

teskje · 2025-05-28T13:33:51Z

src/compute/src/compute_state/peek_result_iterator.rs

+        literals: L,
+        oks_handle: &mut Tr,
+    ) -> Self {
+        let (cursor, storage): (<Tr as TraceReader>::Cursor, <Tr as TraceReader>::Storage) =


These type annotations are not needed, are they?

I think no, these where from some work in progress where types didn't line up

teskje · 2025-05-28T13:39:51Z

src/compute/src/compute_state/peek_result_iterator.rs

+        tracing::trace!(
+            ?self.literals_exhausted,
+            key_valid = self.cursor.key_valid(&self.storage),
+            val_valid = self.cursor.val_valid(&self.storage), "next");


How useful are these traces? I think they will mostly print a lot of true/false that are hard to interpret? Should they also print the actual keys and values?

Reading the key/val wouldn't always succeed because it fails when they're not valid. So I should rather remove all these trace logs, yeah?

If they are not useful I'd remove them, yes. There is some small readability/maintainability cost associated with them that we can avoid.

teskje · 2025-05-28T13:41:41Z

src/compute/src/compute_state/peek_result_iterator.rs

+        if self.cursor.val_valid(&self.storage) {
+            return false;
+        }


This is surprising I think. The doc string says that this method will step the key forward, but it only does so if the current value is not valid, which seems like an important detail!

A thing that would make sense to check here is key_valid. Maybe you meant to do that?

The intended behavior was actually maybe_step_key, and it would only step when the val is not valid. Which is what the code did. In practice the method is only called when the val is not valid, so I changed that check into an assert and left the docstring as is.

teskje · 2025-05-28T13:58:20Z

src/compute/src/compute_state/peek_result_iterator.rs

+            if self.cursor.val_valid(&self.storage) {
+                break;
+            }


For this break to be invoked we would need to have a key with zero values. Is this possible?

I think you meant it the other way round, right? That you expect val_valid to always be true so we always break?

I think it can happen that we go around the loop multiple times but I don't remember why. I pushed a commit that panics when that happens, so let's see what ci has to say.

Ah sorry, yeah that's what I meant, I just got the condition flipped.

teskje · 2025-05-28T14:08:16Z

Pretty sure this will conflict with #32593, sorry 🙈

aljoscha · 2025-05-28T16:49:58Z

@teskje thanks for the review! I hope I addressed all comments, could you please take a look again? 🙇‍♂️

teskje

Sorry, I had only gotten to the PeekResultIterator implementation before. Not I've got everything. Looks good overall, some comments about tightening the invariants.

teskje · 2025-05-28T17:13:01Z

src/compute/src/compute_state.rs

+        // We have to sort the literal constraints because cursor.seek_key can
+        // seek only forward.
+        peek.literal_constraints
+            .iter_mut()
+            .for_each(|vec| vec.sort());
+        let has_literal_constraints = peek.literal_constraints.is_some();
+        let literals = Option::take(&mut peek.literal_constraints)
+            .into_iter()
+            .flatten();


What do you think of moving the sorting into PeekResultIterator::new? It's an invariant the PeekResultIterator requires but doesn't document and cannot check.

Doing this would require changing the type of the literals argument to something containing Vec, not sure if that causes problems.

will do! I'll change it to take a Option<Vec<>> and tighten up that unspoken invariant. 👌

teskje · 2025-05-28T17:22:44Z

src/compute/src/compute_state/peek_result_iterator.rs

+            // if copies > 0 ... otherwise skip
+            if let Some(copies) = NonZeroI64::new(copies) {
+                Ok(Some((result, copies)))
+            } else {
+                Ok(None)
+            }


The comment is somewhat misleading: We only test if copies != 0 in the line below. We already tested that copies is not negative above, so it's technically not wrong, but... still confusing!

I would:

Remove the comment, it doesn't add anything imo.

Change the diff output type of the iterator to NonZeroUsize.

The second thing also safes an unwrap in the caller, which is nice.

The original code has NonZeroUsize, but I changed to this because the large SELECT changes that stash these updates in a persist batch need to write i64, and the original data coming out of the trace is i64, so it felt better to preserve the type here and to the cast/expect in the current in-memory peek. Otherwise I'd have to cast from NonZeroUsize to i64 in the persist stash work, which felt worse.

I think a more correct solution here could be do change it to NonZeroI64 throughout the whole code paths of sending back the result, but I didn't want to go down that rabbit hole. What do you think now?

teskje · 2025-05-28T17:30:01Z

src/compute/src/compute_state/peek_result_iterator.rs

+        assert_eq!(
+            false,
+            self.cursor.val_valid(&self.storage),


Suggested change

assert_eq!(

false,

self.cursor.val_valid(&self.storage),

assert!(

!self.cursor.val_valid(&self.storage),

I did it like this on purpose because I don't like the assert!(! past, with the two exclamation marks, but happy to change to that.

antiguru

I think this is fine. Left some comments inline.

On a higher level, this code could use some modernization. We don't need the separate key/val_valid APIs anymore, we can just use the _get variants in all places---they return options, which avoids one of the bounds checks, and might make the code more obvious in what it's doing. Feel free to pick this up as part of this PR, but don't block on it.

antiguru · 2025-05-29T07:29:11Z

src/compute/src/compute_state/peek_result_iterator.rs

+                    {
+                        // The cursor found a record whose key matches the current literal.
+                        // We return and calls to `next()` will start
+                        // returning it's vals.


Suggested change

// returning it's vals.

// returning its vals.

antiguru · 2025-05-29T07:33:04Z

src/compute/src/compute_state/peek_result_iterator.rs

+                    if !self.cursor.key_valid(&self.storage) {
+                        return;
+                    }
+                    if self.cursor.get_key(&self.storage).unwrap()
+                        == IntoOwned::borrow_as(current_literal)


Suggested change

if !self.cursor.key_valid(&self.storage) {

return;

}

if self.cursor.get_key(&self.storage).unwrap()

== IntoOwned::borrow_as(current_literal)

if self.cursor.get_key(&self.storage).map_or(true, |key| key == IntoOwned::borrow_as(current_literal))

We can simplify this a bit.

antiguru · 2025-05-29T07:35:07Z

src/compute/src/compute_state/peek_result_iterator.rs

+                    // NOTE(vmarcos): We expect the extra allocations below to be manageable
+                    // since we only perform as many of them as there are literals.


Suggested change

// NOTE(vmarcos): We expect the extra allocations below to be manageable

// since we only perform as many of them as there are literals.

antiguru · 2025-05-29T07:35:53Z

src/compute/src/compute_state/peek_result_iterator.rs

+                    if self.cursor.get_key(&self.storage).unwrap()
+                        == IntoOwned::borrow_as(current_literal)
+                    {
+                        // The cursor found a record whose key matches the current literal.


Suggested change

// The cursor found a record whose key matches the current literal.

// The cursor found a record whose key matches the current literal, or we're exhausted.

antiguru · 2025-05-29T07:39:01Z

src/compute/src/compute_state/peek_result_iterator.rs

+    /// Extracts and returns the row currently pointed at by our cursor. Returns
+    /// `Ok(None)` if our MapFilterProject evaluates to `None`. Also returns any
+    /// errors that arise from evaluating the MapFilterProject.
+    fn extract_current_row(&mut self) -> Result<Option<(Row, NonZeroI64)>, String> {


I'd try to pass Tr::Key and Tr::Val to this function, if possible.

I tried but then stopped because this uses the cursor for more things, like map_times. And I didn't want to spend too much time on changing this much beyond the original code.

The original motivation for this is so that the code that extracts peek results can be re-used in MaterializeInc/database-issues#9180, where we want to use a different transport for sending back peek responses but still need to read them out of arrangements the same way. The nice side effect is that we separate extracting the result from the logic that accumulates it in a response for sending it back. Which leads to clearer separation.

aljoscha · 2025-05-29T16:03:55Z

@antiguru & @teskje Pushed commits to address your comments. If you want you could take another look 👌

aljoscha requested a review from a team as a code owner May 16, 2025 11:53

antiguru self-requested a review May 16, 2025 12:06

aljoscha force-pushed the compute-refactor-peek-result-iterator branch from 570a68c to c3a400a Compare May 16, 2025 12:32

aljoscha force-pushed the compute-refactor-peek-result-iterator branch 2 times, most recently from e75dd05 to 214ebd8 Compare May 28, 2025 14:00

teskje reviewed May 28, 2025

View reviewed changes

aljoscha changed the title ~~compute: factor out PeekResultIterator, refactor peek fulfillment~~ compute: factor out PeekResultIterator May 28, 2025

aljoscha force-pushed the compute-refactor-peek-result-iterator branch from 4640550 to a74366e Compare May 28, 2025 16:43

teskje reviewed May 28, 2025

View reviewed changes

antiguru approved these changes May 29, 2025

View reviewed changes

aljoscha added 7 commits May 29, 2025 17:20

fixup! result iterator

18fb615

fixup! tighten contract around sorting of literal constraints

a099b35

fixup! remove trace logging

8aa214c

fixup! nits

fdc083b

compute: modernize trace reader usage in PeekResultIterator

c6e6444

suss out when the val is not valid after stepping

7eb5470

aljoscha force-pushed the compute-refactor-peek-result-iterator branch from a74366e to 7eb5470 Compare May 29, 2025 15:21

		// NOTE(vmarcos): We expect the extra allocations below to be manageable
		// since we only perform as many of them as there are literals.

	// The cursor found a record whose key matches the current literal.
	// The cursor found a record whose key matches the current literal, or we're exhausted.

compute: factor out PeekResultIterator #32514

Are you sure you want to change the base?

compute: factor out PeekResultIterator #32514

Uh oh!

Conversation

aljoscha commented May 16, 2025

Uh oh!

aljoscha commented May 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

teskje commented May 28, 2025

Uh oh!

aljoscha commented May 28, 2025

Uh oh!

teskje left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antiguru left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aljoscha commented May 29, 2025

Uh oh!

Uh oh!