Incremental `GROUP BY _block_num` and `DISTINCT BY _block_num` by leoyvens · Pull Request #1877 · edgeandnode/amp

leoyvens · 2026-02-27T14:49:08Z

This implements two related features:

Streaming DISTINCT ON and GROUP BY, when the key is _block_num
Using function call syntax block_num() to more easily refer to the block number anywhere in the query.

Our execution semantics essentially already support the special case where an aggregation is restricted to within a block, thanks to the assumption that data for a same block never spans more than a single microbatch. So executing the aggregation in isolation on each microbatch yields correct results. The necessary changes were around incremental query validation checks and block number propagation.

Signed-off-by: Leonardo Yvens <leoyvens@gmail.com>

Allow GROUP BY queries that include _block_num as a group key to work with incremental processing instead of being rejected. - Handle Aggregate in BlockNumPropagator by setting next_block_num_expr - Remove Aggregate from the unsupported-node error arm Signed-off-by: Leonardo Yvens <leoyvens@gmail.com>

…P BY

LNSD

Please, check my comments 🙂

LNSD · 2026-02-27T15:14:57Z

crates/core/common/src/lib.rs

 pub use datafusion::{arrow, parquet};
 pub use datasets_common::{block_num::BlockNum, block_range::BlockRange, end_block::EndBlock};

+pub mod block_num_udf;


Can we rename the module to just block_num? maybe we can move all the UDFs under common::udfs (e.g., common::udfs::evm::* or common::udfs::block_num)

Signed-off-by: Leo <leo@edgeandnode.com>

leoyvens and others added 6 commits February 26, 2026 15:13

feat(common): support incremental DISTINCT ON with _block_num

aaf3286

Signed-off-by: Leonardo Yvens <leoyvens@gmail.com>

implement block_num() UDF

d8b0051

test(tests): add integration tests for streaming DISTINCT ON and GROU…

c83c54a

…P BY

chore(common): fmt

f73605a

refactor(common): extract plan_visitors tests to separate file

e1f3c62

leoyvens requested a review from Theodus February 27, 2026 14:49

leoyvens marked this pull request as draft February 27, 2026 14:49

chore(common): fmt

d95cad5

leoyvens marked this pull request as ready for review February 27, 2026 15:00

LNSD reviewed Feb 27, 2026

View reviewed changes

refactor(common): rename block_num_udf module to block_num

b7bf6da

Signed-off-by: Leo <leo@edgeandnode.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental `GROUP BY _block_num` and `DISTINCT BY _block_num`#1877

Incremental `GROUP BY _block_num` and `DISTINCT BY _block_num`#1877
leoyvens wants to merge 8 commits intomainfrom
incremental-distinct-on

leoyvens commented Feb 27, 2026

Uh oh!

LNSD left a comment

Uh oh!

LNSD Feb 27, 2026

Uh oh!

leoyvens Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leoyvens commented Feb 27, 2026

Uh oh!

LNSD left a comment

Choose a reason for hiding this comment

Uh oh!

LNSD Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

leoyvens Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants