Move Selection logic into ReadPlan builder #7537

alamb · 2025-05-21T18:13:15Z

Note that 400+ lines of this PR are new unit tests. The actual code change is much smaller

Draft until

POC for making things faster with this approach

Which issue does this PR close?

This is a step towards implementing Adaptive parquet filter selections:

Adaptive Parquet Predicate Pushdown Evaluation #5523

Rationale for this change

Part of the idea of adaptive decoding is the need to have different read strategies based on the patterns of rows selected

The current code mixes

The determination of the exact read/skip pattern
The actual decoding of the rows.

This makes it hard to add additional complexity to determining the read/skip pattern, for example @zhuqi-lucas had to put Bitmap selection the logic in the middle of the decoder here:

POC Adaptive predicate push down based read plan #7524

Similarly to the way the filter kernel decides up front how to scan, I think we should also change the parquet reader to determine what to do up front and then just do it during decode.

Splitting the planning from the execution also gives us a place to generate (and unit test) various heuristics for the plan

Change

Move the calculation of when to read/emit rows into ReadPlan construction
Decode simply

There is no change in behavior intended -- the selection evaluation is not yet adaptive. This is meant to be a pure refactoring. I have added tests / test framework to make it easier to make this adaptive in the future

What changes are included in this PR?

Are there any user-facing changes?

Next up I will change from RowSelector into a different enum

/// How to select the next batch of rows to read from the Parquet file
///
/// This allows the reader to dynamically choose between decoding strategies
pub(crate) enum RowsPlan {
    /// Read n rows
    Read(usize),
    /// Skip n rows
    Skip(usize),
    /// Reads mask.len() rows then applies the filter mask to select just the desired
    /// rows.
    ///
    /// Any row with a 1 value in the mask will be selected and included
    /// in the output batch.
    ///
    /// This is used in situations where the overhead of preferentially decoding
    /// only the selected rows is higher than decoding all rows and then
    /// applying a mask.
    Mask(BooleanBuffer),
}

zhuqi-lucas · 2025-05-22T09:43:14Z

parquet/src/arrow/arrow_reader/read_plan.rs

+    /// 1. To have no empty selections (that select no rows)
+    /// 2. fall on a batch_size boundary (e.g. 0, 100, 200, 300)
+    ///
+    /// TODO change this structure to an enum with emit + mask


Thank you @alamb , this is great idea, it means we can build the range/bitmap at the build time, and also the adaptive policy can applied here.

zhuqi-lucas · 2025-05-22T09:44:19Z

parquet/src/arrow/arrow_reader/read_plan.rs

+/// How to select the next batch of rows to read from the Parquet file
+///
+/// This allows the reader to dynamically choose between decoding strategies
+pub(crate) enum RowsPlan {


Beautiful enum, it will include all the cases!

Thank you.

It isn't quite done yet, but it is getting close.

I am thinking I will likely make 2 PRs

one PR that rearranges when the selectors are created, but still uses RowSelector

a second PR that will switch to use this enum

However, I will leave out the BooleanBuffer part of the enum initially I think, to keep the review load manageabe

Then I am thinking we can can adapt the code from these PRs

Poc for adaptive parquet predicate pushdown(bitmap/range) with page cache(3 data pages) #7454

Draft POC Unified filter decoder #7503

POC Adaptive predicate push down based read plan #7524

To implement filtering via mask

It makes sense!

alamb · 2025-05-22T15:16:13Z

parquet/src/arrow/arrow_reader/mod.rs

-                        }
-                        continue;
-                    }
+        while read_records < batch_size {


The point of this PR is to (further) simplify this inner loop of the parquet decoder.

All the logic for splitting into batch sizes, etc is now done in the ReadPlanBuilder so when this code is invoked it simply does whatever is called for.

The reason for this change is so we can add more complexity into the decision of what to do in subsequent PRs

It's clear!

alamb · 2025-05-22T15:16:54Z

parquet/src/arrow/arrow_reader/mod.rs

-                    if front.row_count == 0 {
-                        continue;
-                    }
+            if front.skip {


Other than error checking, the inner loop now simply reads a RowSelection and does what it says

alamb · 2025-05-22T15:22:06Z

parquet/src/arrow/arrow_reader/read_plan.rs

+}
+
+#[cfg(test)]
+mod tests {


Technically the plan generation code is already fully covered by other tests in this crate, but I added new unit tests here to:

Document the behavior better

Make it easier to write tests for new behavior (like filter mask implementation)

alamb · 2025-05-22T16:34:37Z

parquet/src/arrow/arrow_reader/read_plan.rs

+    type Item = RowSelector;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        while let Some(mut front) = self.input_selectors.pop_front() {


This logic used to be in the RecordBatchReader::next call

It is refactored out into its own module so it can be more easily tested and (eventually) extended.

alamb · 2025-05-22T16:36:04Z

parquet/src/arrow/arrow_reader/selection.rs

@@ -358,14 +361,6 @@ impl RowSelection {
        self.selectors.iter().any(|x| !x.skip)
    }

-    /// Trims this [`RowSelection`] removing any trailing skips
-    pub(crate) fn trim(mut self) -> Self {


this is moved into the plan

alamb · 2025-05-22T19:48:29Z

I am pretty happy with how this currently looks, but before I mark it for review I want to make a proof of concept that I can actually improve performance with it

alamb · 2025-05-22T19:49:40Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (ee714ca) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

alamb · 2025-05-22T20:15:35Z

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.01     14.9±0.10ms        ? ?/sec    1.00     14.8±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.01     16.9±0.16ms        ? ?/sec    1.00     16.7±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.01     39.5±0.32ms        ? ?/sec    1.00     39.0±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     53.2±0.41ms        ? ?/sec    1.00     52.7±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.01     51.0±0.34ms        ? ?/sec    1.00     50.3±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.0±0.05ms        ? ?/sec    1.05      5.2±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.02    163.2±0.59ms        ? ?/sec    1.00    160.2±0.72ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.03    210.9±0.99ms        ? ?/sec    1.00    205.5±0.88ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.04    434.6±1.73ms        ? ?/sec    1.00    419.3±2.68ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.04    505.1±9.31ms        ? ?/sec    1.00    483.4±5.74ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.05     58.7±0.62ms        ? ?/sec    1.00     55.8±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.05    168.4±1.17ms        ? ?/sec    1.00    160.5±1.00ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.07    169.4±0.66ms        ? ?/sec    1.00    158.1±0.79ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.07     67.9±0.27ms        ? ?/sec    1.00     63.6±0.72ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.06    174.5±1.02ms        ? ?/sec    1.00    165.2±1.09ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.06    105.0±0.61ms        ? ?/sec    1.00     98.8±0.77ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.04     40.6±0.24ms        ? ?/sec    1.00     39.0±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.05     50.6±0.25ms        ? ?/sec    1.00     48.2±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.04     54.8±0.38ms        ? ?/sec    1.00     52.9±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.05     41.9±0.18ms        ? ?/sec    1.00     39.8±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.04     14.8±0.15ms        ? ?/sec    1.00     14.3±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.00ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.03     13.7±0.04ms        ? ?/sec    1.00     13.3±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.02     15.5±0.06ms        ? ?/sec    1.00     15.3±0.09ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.01     41.5±0.20ms        ? ?/sec    1.00     41.0±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.01     54.7±0.33ms        ? ?/sec    1.00     54.3±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.01     53.1±0.41ms        ? ?/sec    1.00     52.7±0.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.3±0.01ms        ? ?/sec    1.01      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.02    180.4±0.71ms        ? ?/sec    1.00    177.4±0.65ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.03    239.5±2.89ms        ? ?/sec    1.00    233.0±2.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.04    496.7±3.98ms        ? ?/sec    1.00    477.6±2.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    438.9±8.52ms        ? ?/sec    1.01   444.3±14.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.02     56.9±1.10ms        ? ?/sec    1.00     55.9±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.02    156.6±1.00ms        ? ?/sec    1.00    153.1±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.03    155.3±0.91ms        ? ?/sec    1.00    151.4±0.84ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     63.8±0.35ms        ? ?/sec    1.00     63.6±0.53ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.02    161.6±1.35ms        ? ?/sec    1.00    158.5±0.96ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.04     96.3±0.44ms        ? ?/sec    1.00     92.6±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     32.2±0.42ms        ? ?/sec    1.00     32.1±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     35.1±0.37ms        ? ?/sec    1.01     35.3±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     49.9±0.26ms        ? ?/sec    1.02     50.8±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.5±0.18ms        ? ?/sec    1.02     38.3±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.6±0.05ms        ? ?/sec    1.02     13.9±0.13ms        ? ?/sec

zhuqi-lucas

LGTM Thank you @alamb, great work!

zhuqi-lucas · 2025-05-23T03:31:20Z

parquet/src/arrow/arrow_reader/mod.rs

-                        }
-                        continue;
-                    }
+        while read_records < batch_size {


It's clear!

zhuqi-lucas · 2025-05-23T03:35:40Z

parquet/src/arrow/arrow_reader/read_plan.rs

+/// The returned stream of [`RowSelector`]s is guaranteed to have:
+/// 1. No empty selections (that select no rows)
+/// 2. No selections that span batch_size boundaries
+/// 3. No trailing skip selections
+///
+/// For example, if the `batch_size` is 100 and we are selecting all 200 rows
+/// from a Parquet file, the selectors will be:
+/// - `RowSelector::select(100)  <-- forced break at batch_size boundary`
+/// - `RowSelector::select(100)`


Great work !

zhuqi-lucas · 2025-05-23T03:49:03Z

parquet/src/arrow/arrow_reader/mod.rs

+                    ));
+                }
+            } else {
+                let read = self.array_reader.read_records(front.row_count)?;


Minor, do we have fast path when read < front.row_count?

I am not quite sure what you mean here -- if read < row_count I think that means the array is exhausted and the row group is done.

What sort of fast path would it be?

I am not quite sure what you mean here -- if read < row_count I think that means the array is exhausted and the row group is done.

What sort of fast path would it be?

That's right @alamb , sorry for that i am not describing it correctly, i mean do we need to break early for it. It looks like not needed.

alamb · 2025-05-23T12:51:55Z

🤖: Benchmark completed

🤔 it seems it is less efficient

alamb · 2025-05-23T12:52:23Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (ee714ca) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

alamb · 2025-05-23T13:08:41Z

Update here is that this is not looking super promising and I am somewhat stuck with how to integrate mask based selections into the logic more cleanly. I need to think about it some more.

I may park this for a while and continue working on filter results caching some more

alamb · 2025-05-23T13:18:42Z

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.01      2.4±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.05     15.4±0.18ms        ? ?/sec    1.00     14.6±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.02     16.9±0.11ms        ? ?/sec    1.00     16.5±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.04     39.8±0.35ms        ? ?/sec    1.00     38.1±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     53.7±0.44ms        ? ?/sec    1.00     53.4±0.56ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.02     51.4±0.34ms        ? ?/sec    1.00     50.4±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.1±0.04ms        ? ?/sec    1.00      5.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.03    164.3±0.52ms        ? ?/sec    1.00    159.0±1.15ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.13    231.1±1.03ms        ? ?/sec    1.00    204.3±1.42ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.08    513.1±1.80ms        ? ?/sec    1.00    475.6±2.61ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.02    496.8±8.85ms        ? ?/sec    1.00   486.4±11.94ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.02     59.2±0.90ms        ? ?/sec    1.00     58.1±0.73ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.03    167.1±0.70ms        ? ?/sec    1.00    161.8±0.83ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.04    167.4±1.78ms        ? ?/sec    1.00    160.3±0.84ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.04     66.7±0.33ms        ? ?/sec    1.00     64.4±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.04    173.2±0.94ms        ? ?/sec    1.00    166.5±0.90ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.05    104.0±0.72ms        ? ?/sec    1.00     98.8±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.02     39.8±0.30ms        ? ?/sec    1.00     38.9±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.04     50.0±0.87ms        ? ?/sec    1.00     48.0±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     53.9±0.57ms        ? ?/sec    1.00     54.0±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.01     40.5±0.33ms        ? ?/sec    1.00     40.2±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.4±0.09ms        ? ?/sec    1.01     14.5±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.2±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.04     13.7±0.07ms        ? ?/sec    1.00     13.2±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.03     15.6±0.09ms        ? ?/sec    1.00     15.1±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     38.5±0.21ms        ? ?/sec    1.05     40.2±0.44ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     51.1±0.24ms        ? ?/sec    1.07     54.7±0.64ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     50.0±0.34ms        ? ?/sec    1.02     51.1±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.01      4.3±0.03ms        ? ?/sec    1.00      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.03    179.7±0.75ms        ? ?/sec    1.00    174.5±1.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.03    236.0±1.17ms        ? ?/sec    1.00    230.2±2.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.04    492.8±1.95ms        ? ?/sec    1.00    472.5±2.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.04   456.8±15.44ms        ? ?/sec    1.00   440.6±12.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     56.3±0.62ms        ? ?/sec    1.00     56.4±0.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.05    159.2±1.04ms        ? ?/sec    1.00    152.0±0.86ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.04    157.1±1.01ms        ? ?/sec    1.00    151.3±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.04     64.5±0.37ms        ? ?/sec    1.00     62.2±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.03    161.2±1.01ms        ? ?/sec    1.00    156.0±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.05     97.0±0.52ms        ? ?/sec    1.00     92.0±0.62ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.02     32.6±0.25ms        ? ?/sec    1.00     32.1±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.03     35.4±0.27ms        ? ?/sec    1.00     34.5±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     50.6±0.37ms        ? ?/sec    1.01     50.9±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     38.0±0.24ms        ? ?/sec    1.00     37.9±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.7±0.05ms        ? ?/sec    1.01     13.8±0.08ms        ? ?/sec

zhuqi-lucas · 2025-05-23T13:22:13Z

Update here is that this is not looking super promising and I am somewhat stuck with how to integrate mask based selections into the logic more cleanly. I need to think about it some more.

I may park this for a while and continue working on filter results caching some more

Thank you very much @alamb , i can continue help investigate why the performance has some regression for this PR.

zhuqi-lucas · 2025-05-23T14:47:15Z

It looks like remove this check, can improve a little performance, but still regression some cases, can't find the root cause until now.

        // Reader should read exactly `batch_size` records except for last batch
        if !end_of_stream && (read_records != batch_size) {
            return Err(general_err!(
                "Internal Error: unexpected read count. Expected {batch_size} got {read_records}"
            ));
        }

alamb · 2025-05-23T17:19:05Z

It looks like remove this check, can improve a little performance, but still regression some cases, can't find the root cause until now.

Thansk @zhuqi-lucas -- I will remove that check

alamb · 2025-05-23T18:07:15Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (00a0f1f) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

alamb · 2025-05-23T18:33:13Z

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.01      2.4±0.05ms        ? ?/sec    1.00      2.3±0.00ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.1±0.10ms        ? ?/sec    1.01     14.3±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.4±0.15ms        ? ?/sec    1.00     16.4±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     37.4±0.20ms        ? ?/sec    1.01     37.8±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     50.9±0.38ms        ? ?/sec    1.01     51.6±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     48.5±0.20ms        ? ?/sec    1.01     48.9±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      4.9±0.04ms        ? ?/sec    1.01      5.0±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    157.3±0.67ms        ? ?/sec    1.08    170.2±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    220.4±0.81ms        ? ?/sec    1.00    220.4±1.27ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.01    487.1±1.89ms        ? ?/sec    1.00    482.5±3.29ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.01   488.8±19.49ms        ? ?/sec    1.00    484.6±9.86ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.01     57.1±0.82ms        ? ?/sec    1.00     56.4±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    161.4±0.98ms        ? ?/sec    1.00    160.7±1.14ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    158.9±0.78ms        ? ?/sec    1.01    159.8±1.08ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.02     64.4±0.50ms        ? ?/sec    1.00     62.9±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    165.7±0.96ms        ? ?/sec    1.00    165.3±1.03ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     98.5±0.54ms        ? ?/sec    1.00     98.0±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.4±0.28ms        ? ?/sec    1.01     38.7±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     47.5±0.30ms        ? ?/sec    1.01     48.1±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     53.1±0.42ms        ? ?/sec    1.00     53.2±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     39.9±0.28ms        ? ?/sec    1.00     39.8±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.3±0.09ms        ? ?/sec    1.00     14.3±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.2±0.01ms        ? ?/sec    1.00      2.2±0.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.01     13.1±0.06ms        ? ?/sec    1.00     13.0±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.01     15.0±0.06ms        ? ?/sec    1.00     14.9±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.01     39.5±0.22ms        ? ?/sec    1.00     39.3±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.03     52.7±0.42ms        ? ?/sec    1.00     51.3±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     50.3±0.37ms        ? ?/sec    1.00     50.3±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.03ms        ? ?/sec    1.02      4.3±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    173.9±0.68ms        ? ?/sec    1.00    174.5±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    228.7±3.39ms        ? ?/sec    1.00    227.9±2.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    468.2±1.80ms        ? ?/sec    1.00    468.1±2.34ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.03   438.5±15.37ms        ? ?/sec    1.00   426.0±12.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.02     54.3±0.35ms        ? ?/sec    1.00     53.5±0.56ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    150.2±0.74ms        ? ?/sec    1.00    150.7±0.78ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    148.4±0.85ms        ? ?/sec    1.00    148.9±0.75ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     62.0±0.34ms        ? ?/sec    1.00     61.2±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    154.0±0.89ms        ? ?/sec    1.00    154.5±0.92ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     90.2±0.44ms        ? ?/sec    1.00     90.6±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     31.2±0.25ms        ? ?/sec    1.00     31.3±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.4±0.19ms        ? ?/sec    1.02     33.9±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     49.5±0.28ms        ? ?/sec    1.00     49.3±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.0±0.22ms        ? ?/sec    1.00     37.0±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.01     13.6±0.12ms        ? ?/sec    1.00     13.5±0.05ms        ? ?/sec

Dandandan · 2025-05-23T18:42:38Z

parquet/src/arrow/arrow_reader/read_plan.rs

+    /// how many records have been read by RowSelection in the "current" batch
+    read_records: usize,
+    /// Input selectors to read from
+    input_selectors: VecDeque<RowSelector>,


I think Vec can be used here (track an index)?

I used VecDeque as that is how the current code does it

I can try and see if it makes any difference,.

alamb · 2025-05-23T18:47:12Z

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

alamb · 2025-05-23T18:57:25Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (76b1ef6) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

alamb · 2025-05-23T19:23:08Z

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.00      2.4±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.02     14.5±0.06ms        ? ?/sec    1.00     14.2±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.02     16.3±0.07ms        ? ?/sec    1.00     16.0±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.05     39.2±0.22ms        ? ?/sec    1.00     37.3±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.04     52.5±0.27ms        ? ?/sec    1.00     50.6±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.04     50.6±0.40ms        ? ?/sec    1.00     48.5±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.0±0.19ms        ? ?/sec    1.00      5.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.03    162.5±0.59ms        ? ?/sec    1.00    157.3±0.76ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.04    208.9±0.91ms        ? ?/sec    1.00    201.8±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.07    433.8±1.22ms        ? ?/sec    1.00    407.3±1.45ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.03   496.1±14.49ms        ? ?/sec    1.00    481.9±5.63ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.03     57.6±0.38ms        ? ?/sec    1.00     56.2±0.53ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.03    165.4±1.09ms        ? ?/sec    1.00    160.1±0.91ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.03    162.7±0.90ms        ? ?/sec    1.00    158.4±1.39ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.02     64.8±0.38ms        ? ?/sec    1.00     63.2±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.03    168.6±0.91ms        ? ?/sec    1.00    164.2±0.87ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.06    103.2±0.53ms        ? ?/sec    1.00     97.7±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.02     39.1±0.46ms        ? ?/sec    1.00     38.5±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.03     48.8±0.72ms        ? ?/sec    1.00     47.5±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     52.9±0.59ms        ? ?/sec    1.01     53.1±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     40.1±0.32ms        ? ?/sec    1.00     39.9±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.2±0.10ms        ? ?/sec    1.00     14.3±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.02     13.3±0.06ms        ? ?/sec    1.00     13.0±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.02     15.2±0.07ms        ? ?/sec    1.00     14.9±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.05     41.4±0.26ms        ? ?/sec    1.00     39.4±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.04     53.7±0.31ms        ? ?/sec    1.00     51.6±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.06     52.5±0.43ms        ? ?/sec    1.00     49.6±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.3±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.03    178.4±0.82ms        ? ?/sec    1.00    173.9±0.57ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.04    238.4±1.66ms        ? ?/sec    1.00    229.9±2.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.05    491.4±3.19ms        ? ?/sec    1.00    467.3±2.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    431.0±6.85ms        ? ?/sec    1.01   435.4±16.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.03     54.8±0.52ms        ? ?/sec    1.00     53.2±0.63ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.03    155.4±0.63ms        ? ?/sec    1.00    151.0±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.03    153.0±0.65ms        ? ?/sec    1.00    148.4±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.02     62.4±0.33ms        ? ?/sec    1.00     61.0±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.02    157.7±0.70ms        ? ?/sec    1.00    153.9±0.84ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.06     95.5±0.59ms        ? ?/sec    1.00     90.1±0.66ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.02     31.6±0.27ms        ? ?/sec    1.00     31.0±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.03     34.5±0.26ms        ? ?/sec    1.00     33.5±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.01     49.3±0.18ms        ? ?/sec    1.00     49.0±0.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     36.9±0.12ms        ? ?/sec    1.00     36.8±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.5±0.07ms        ? ?/sec    1.00     13.5±0.11ms        ? ?/sec

alamb · 2025-05-23T19:32:20Z

🤔 seems like vec made things slower (see results here)

alamb · 2025-05-23T19:34:16Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (948374a) to e9df239 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

alamb · 2025-05-23T20:00:00Z

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.00      2.4±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.3±0.50ms        ? ?/sec    1.00     14.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.1±0.09ms        ? ?/sec    1.00     16.2±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     37.2±0.24ms        ? ?/sec    1.00     37.1±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     50.9±0.34ms        ? ?/sec    1.00     50.5±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.01     48.7±0.38ms        ? ?/sec    1.00     48.2±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      4.9±0.03ms        ? ?/sec    1.02      5.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    158.0±0.59ms        ? ?/sec    1.00    157.2±0.79ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    202.5±0.87ms        ? ?/sec    1.07    217.4±1.07ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    405.0±1.25ms        ? ?/sec    1.18    478.0±2.49ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.09    506.7±7.67ms        ? ?/sec    1.00    463.4±4.56ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.01     56.7±0.81ms        ? ?/sec    1.00     56.1±1.00ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    159.5±0.91ms        ? ?/sec    1.01    160.3±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    157.2±0.92ms        ? ?/sec    1.01    158.5±0.78ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.03     65.1±0.78ms        ? ?/sec    1.00     63.2±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    165.1±1.18ms        ? ?/sec    1.00    165.3±0.81ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.01     98.3±0.41ms        ? ?/sec    1.00     97.8±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.4±0.27ms        ? ?/sec    1.00     38.5±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.01     48.3±0.26ms        ? ?/sec    1.00     47.7±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     53.0±0.63ms        ? ?/sec    1.01     53.4±0.92ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.01     40.1±0.33ms        ? ?/sec    1.00     39.8±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.01     14.3±0.10ms        ? ?/sec    1.00     14.2±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.2±0.01ms        ? ?/sec    1.00      2.2±0.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     13.1±0.06ms        ? ?/sec    1.00     13.0±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     15.0±0.05ms        ? ?/sec    1.00     14.9±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     39.3±0.27ms        ? ?/sec    1.00     39.1±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.01     52.3±0.32ms        ? ?/sec    1.00     51.8±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     50.6±0.33ms        ? ?/sec    1.00     50.5±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.01ms        ? ?/sec    1.02      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    174.8±0.73ms        ? ?/sec    1.00    174.1±0.77ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    230.1±2.96ms        ? ?/sec    1.00    230.1±1.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    467.1±1.94ms        ? ?/sec    1.01    469.8±2.79ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.01   439.6±12.85ms        ? ?/sec    1.00   434.9±15.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     53.6±0.53ms        ? ?/sec    1.00     53.4±0.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    150.8±0.81ms        ? ?/sec    1.00    150.4±0.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.01    150.1±0.84ms        ? ?/sec    1.00    148.9±0.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     62.2±0.32ms        ? ?/sec    1.00     61.3±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    154.1±0.78ms        ? ?/sec    1.00    154.1±0.75ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     90.4±0.59ms        ? ?/sec    1.00     90.5±0.55ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.01     31.1±0.34ms        ? ?/sec    1.00     30.8±0.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.7±0.28ms        ? ?/sec    1.00     33.6±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     49.6±0.26ms        ? ?/sec    1.00     49.4±0.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.2±0.17ms        ? ?/sec    1.00     37.0±0.34ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.01     13.7±0.09ms        ? ?/sec    1.00     13.5±0.03ms        ? ?/sec

zhuqi-lucas · 2025-05-24T03:20:46Z

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

Thank you @alamb, it's a good news!

alamb · 2025-05-24T11:44:25Z

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

Thank you @alamb, it's a good news!

This PR has shown me that for some queries the dispatch logic for RowSelection is quite high (as in just doing an extra compare in that loop made a measurable difference).

@zhuqi-lucas in your testing, did you measure where the cutoff for using a bitmap vs a RowSelector was? I think I remember seeing a value of 10 somewhere

zhuqi-lucas · 2025-05-24T11:53:09Z

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

Thank you @alamb, it's a good news!

This PR has shown me that for some queries the dispatch logic for RowSelection is quite high (as in just doing an extra compare in that loop made a measurable difference).

@zhuqi-lucas in your testing, did you measure where the cutoff for using a bitmap vs a RowSelector was? I think I remember seeing a value of 10 somewhere

I agree @alamb , i was testing with 10 for cutoff for using a bitmap vs a RowSelector, it's a very basic cutoff:

avg_size_of_selector = total row / selectors

if avg_size_of_selector > 10 using selector

if avg_size_of_selector <= 10 using bitmap

And the default is selector because i use it to compute avg_size_of_selector.

alamb · 2025-05-24T11:58:15Z

And the default is selector because i use it to compute avg_size_of_selector.

Make sense -- thank you

I found SlicesIterator when looking at the Bitmap --> RowSelection code the other day. I think that could be used to determine the "average run length" so we could continue to use skip/select for large contiguous runs but switch to bitmap when there are smaller

The other thing I couldn't easily work out was if there was any way to switch from select/skip within a output batch, or if the plan needs to be either RowSelector or BitMap for each output batch

Or maybe we could just add a third type of ReadPlan, namely ReadPlan::Bitmap 🤔

zhuqi-lucas · 2025-05-24T12:43:53Z

And the default is selector because i use it to compute avg_size_of_selector.

Make sense -- thank you

I found SlicesIterator when looking at the Bitmap --> RowSelection code the other day. I think that could be used to determine the "average run length" so we could continue to use skip/select for large contiguous runs but switch to bitmap when there are smaller

The other thing I couldn't easily work out was if there was any way to switch from select/skip within a output batch, or if the plan needs to be either RowSelector or BitMap for each output batch

Or maybe we could just add a third type of ReadPlan, namely ReadPlan::Bitmap 🤔

Thank you @alamb , this is very good point:

I was testing for output batch, we both use either RowSelector or BitMap for each output batch:

Because, it may happen 8192 => bitmap, 8192 => selector, 8192 => bitmap...

We can't use only one to make it optimize.

I think the best optimize way is :

We have the basic default window size for adaptive batch size 8192, just like above case we setting bitmap/selector for batch size.
But we also support merging window for the same type batch window:

For example, we have a output batch, after selecting 5 batch size:

8192 => bitmap
8192 => bitmap
8192 => selector
8192 => selector
8192 => bitmap

We can merge 1, 2 because they are all bitmap.
We can merge 4,5 because they are all selectors.
And remaining one bitmap

But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further.

Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step...

alamb · 2025-05-25T10:59:28Z

But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further.

This is an interesting idea and I think it is worth explroing

Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step...

👍

Another thing that makes this tricky in my mind is that if batch_size is 8000 that requires the total number of 1s in the mask needs to be 8000 -- the mask itself can be substantially larger (e.g. it could be 16000 and select every other row) 🤔

zhuqi-lucas · 2025-05-25T12:09:59Z

But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further.

This is an interesting idea and I think it is worth explroing

Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step...

👍

Another thing that makes this tricky in my mind is that if batch_size is 8000 that requires the total number of 1s in the mask needs to be 8000 -- the mask itself can be substantially larger (e.g. it could be 16000 and select every other row) 🤔

Very good point! @alamb It's hard for us to reduce it's overhead, maybe we can setting something like max_bitmap_iterator:

When bitmap iterator hit > max_bitmap_iterator, we can consume it first as a output batch, and then to merge those batch finally. But i am not sure if it will make the performance worse than using selector.

github-actions bot added the parquet Changes to the parquet crate label May 21, 2025

zhuqi-lucas reviewed May 22, 2025

View reviewed changes

alamb force-pushed the alamb/row_selection_plan branch 3 times, most recently from 6bcb7a6 to ee714ca Compare May 22, 2025 16:37

alamb commented May 22, 2025

View reviewed changes

alamb mentioned this pull request May 22, 2025

Alamb/sketch boolean array #7540

Draft

zhuqi-lucas approved these changes May 23, 2025

View reviewed changes

Dandandan reviewed May 23, 2025

View reviewed changes

alamb force-pushed the alamb/row_selection_plan branch from 76b1ef6 to fb87e2c Compare May 23, 2025 19:00

alamb force-pushed the alamb/row_selection_plan branch from fb87e2c to 948374a Compare May 23, 2025 19:32

Move Selection logic into ReadPlan builder

72a8114

alamb force-pushed the alamb/row_selection_plan branch from 948374a to 72a8114 Compare May 24, 2025 11:44

Move Selection logic into ReadPlan builder #7537

Are you sure you want to change the base?

Move Selection logic into ReadPlan builder #7537

Conversation

alamb commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented May 22, 2025

Uh oh!

alamb commented May 22, 2025

Uh oh!

alamb commented May 22, 2025

Uh oh!

zhuqi-lucas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented May 23, 2025

Uh oh!

alamb commented May 23, 2025

Uh oh!

alamb commented May 23, 2025

Uh oh!

alamb commented May 23, 2025

Uh oh!

zhuqi-lucas commented May 23, 2025

Uh oh!

zhuqi-lucas commented May 23, 2025

Uh oh!

alamb commented May 23, 2025

Uh oh!

alamb commented May 23, 2025

Uh oh!

alamb commented May 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented May 23, 2025

Uh oh!

alamb commented May 23, 2025

alamb commented May 21, 2025 •

edited

Loading

zhuqi-lucas May 23, 2025 •

edited

Loading

alamb commented May 23, 2025 •

edited

Loading