Skip to content

Move Selection logic into ReadPlan builder #7537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 21, 2025

Note that 400+ lines of this PR are new unit tests. The actual code change is much smaller

Draft until

  • POC for making things faster with this approach

Which issue does this PR close?

This is a step towards implementing Adaptive parquet filter selections:

Rationale for this change

Part of the idea of adaptive decoding is the need to have different read strategies based on the patterns of rows selected

The current code mixes

  1. The determination of the exact read/skip pattern
  2. The actual decoding of the rows.

This makes it hard to add additional complexity to determining the read/skip pattern, for example @zhuqi-lucas had to put Bitmap selection the logic in the middle of the decoder here:

Similarly to the way the filter kernel decides up front how to scan, I think we should also change the parquet reader to determine what to do up front and then just do it during decode.

Splitting the planning from the execution also gives us a place to generate (and unit test) various heuristics for the plan

Change

  1. Move the calculation of when to read/emit rows into ReadPlan construction
  2. Decode simply

There is no change in behavior intended -- the selection evaluation is not yet adaptive. This is meant to be a pure refactoring. I have added tests / test framework to make it easier to make this adaptive in the future

What changes are included in this PR?

Are there any user-facing changes?

Next up I will change from RowSelector into a different enum

/// How to select the next batch of rows to read from the Parquet file
///
/// This allows the reader to dynamically choose between decoding strategies
pub(crate) enum RowsPlan {
    /// Read n rows
    Read(usize),
    /// Skip n rows
    Skip(usize),
    /// Reads mask.len() rows then applies the filter mask to select just the desired
    /// rows.
    ///
    /// Any row with a 1 value in the mask will be selected and included
    /// in the output batch.
    ///
    /// This is used in situations where the overhead of preferentially decoding
    /// only the selected rows is higher than decoding all rows and then
    /// applying a mask.
    Mask(BooleanBuffer),
}

@github-actions github-actions bot added the parquet Changes to the parquet crate label May 21, 2025
/// 1. To have no empty selections (that select no rows)
/// 2. fall on a batch_size boundary (e.g. 0, 100, 200, 300)
///
/// TODO change this structure to an enum with emit + mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb , this is great idea, it means we can build the range/bitmap at the build time, and also the adaptive policy can applied here.

/// How to select the next batch of rows to read from the Parquet file
///
/// This allows the reader to dynamically choose between decoding strategies
pub(crate) enum RowsPlan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful enum, it will include all the cases!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

It isn't quite done yet, but it is getting close.

I am thinking I will likely make 2 PRs

  1. one PR that rearranges when the selectors are created, but still uses RowSelector
  2. a second PR that will switch to use this enum

However, I will leave out the BooleanBuffer part of the enum initially I think, to keep the review load manageabe

Then I am thinking we can can adapt the code from these PRs

To implement filtering via mask

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense!

@alamb alamb force-pushed the alamb/row_selection_plan branch 3 times, most recently from 6bcb7a6 to ee714ca Compare May 22, 2025 16:37
}
continue;
}
while read_records < batch_size {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this PR is to (further) simplify this inner loop of the parquet decoder.

All the logic for splitting into batch sizes, etc is now done in the ReadPlanBuilder so when this code is invoked it simply does whatever is called for.

The reason for this change is so we can add more complexity into the decision of what to do in subsequent PRs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clear!

if front.row_count == 0 {
continue;
}
if front.skip {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than error checking, the inner loop now simply reads a RowSelection and does what it says

}

#[cfg(test)]
mod tests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the plan generation code is already fully covered by other tests in this crate, but I added new unit tests here to:

  1. Document the behavior better
  2. Make it easier to write tests for new behavior (like filter mask implementation)

type Item = RowSelector;

fn next(&mut self) -> Option<Self::Item> {
while let Some(mut front) = self.input_selectors.pop_front() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic used to be in the RecordBatchReader::next call

It is refactored out into its own module so it can be more easily tested and (eventually) extended.

@@ -358,14 +361,6 @@ impl RowSelection {
self.selectors.iter().any(|x| !x.skip)
}

/// Trims this [`RowSelection`] removing any trailing skips
pub(crate) fn trim(mut self) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is moved into the plan

@alamb
Copy link
Contributor Author

alamb commented May 22, 2025

I am pretty happy with how this currently looks, but before I mark it for review I want to make a proof of concept that I can actually improve performance with it

@alamb
Copy link
Contributor Author

alamb commented May 22, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (ee714ca) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 22, 2025

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.01     14.9±0.10ms        ? ?/sec    1.00     14.8±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.01     16.9±0.16ms        ? ?/sec    1.00     16.7±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.01     39.5±0.32ms        ? ?/sec    1.00     39.0±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     53.2±0.41ms        ? ?/sec    1.00     52.7±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.01     51.0±0.34ms        ? ?/sec    1.00     50.3±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.0±0.05ms        ? ?/sec    1.05      5.2±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.02    163.2±0.59ms        ? ?/sec    1.00    160.2±0.72ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.03    210.9±0.99ms        ? ?/sec    1.00    205.5±0.88ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.04    434.6±1.73ms        ? ?/sec    1.00    419.3±2.68ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.04    505.1±9.31ms        ? ?/sec    1.00    483.4±5.74ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.05     58.7±0.62ms        ? ?/sec    1.00     55.8±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.05    168.4±1.17ms        ? ?/sec    1.00    160.5±1.00ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.07    169.4±0.66ms        ? ?/sec    1.00    158.1±0.79ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.07     67.9±0.27ms        ? ?/sec    1.00     63.6±0.72ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.06    174.5±1.02ms        ? ?/sec    1.00    165.2±1.09ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.06    105.0±0.61ms        ? ?/sec    1.00     98.8±0.77ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.04     40.6±0.24ms        ? ?/sec    1.00     39.0±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.05     50.6±0.25ms        ? ?/sec    1.00     48.2±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.04     54.8±0.38ms        ? ?/sec    1.00     52.9±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.05     41.9±0.18ms        ? ?/sec    1.00     39.8±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.04     14.8±0.15ms        ? ?/sec    1.00     14.3±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.00ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.03     13.7±0.04ms        ? ?/sec    1.00     13.3±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.02     15.5±0.06ms        ? ?/sec    1.00     15.3±0.09ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.01     41.5±0.20ms        ? ?/sec    1.00     41.0±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.01     54.7±0.33ms        ? ?/sec    1.00     54.3±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.01     53.1±0.41ms        ? ?/sec    1.00     52.7±0.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.3±0.01ms        ? ?/sec    1.01      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.02    180.4±0.71ms        ? ?/sec    1.00    177.4±0.65ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.03    239.5±2.89ms        ? ?/sec    1.00    233.0±2.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.04    496.7±3.98ms        ? ?/sec    1.00    477.6±2.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    438.9±8.52ms        ? ?/sec    1.01   444.3±14.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.02     56.9±1.10ms        ? ?/sec    1.00     55.9±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.02    156.6±1.00ms        ? ?/sec    1.00    153.1±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.03    155.3±0.91ms        ? ?/sec    1.00    151.4±0.84ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     63.8±0.35ms        ? ?/sec    1.00     63.6±0.53ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.02    161.6±1.35ms        ? ?/sec    1.00    158.5±0.96ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.04     96.3±0.44ms        ? ?/sec    1.00     92.6±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     32.2±0.42ms        ? ?/sec    1.00     32.1±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     35.1±0.37ms        ? ?/sec    1.01     35.3±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     49.9±0.26ms        ? ?/sec    1.02     50.8±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.5±0.18ms        ? ?/sec    1.02     38.3±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.6±0.05ms        ? ?/sec    1.02     13.9±0.13ms        ? ?/sec

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @alamb, great work!

}
continue;
}
while read_records < batch_size {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clear!

Comment on lines +149 to +157
/// The returned stream of [`RowSelector`]s is guaranteed to have:
/// 1. No empty selections (that select no rows)
/// 2. No selections that span batch_size boundaries
/// 3. No trailing skip selections
///
/// For example, if the `batch_size` is 100 and we are selecting all 200 rows
/// from a Parquet file, the selectors will be:
/// - `RowSelector::select(100) <-- forced break at batch_size boundary`
/// - `RowSelector::select(100)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work !

));
}
} else {
let read = self.array_reader.read_records(front.row_count)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, do we have fast path when read < front.row_count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what you mean here -- if read < row_count I think that means the array is exhausted and the row group is done.

What sort of fast path would it be?

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what you mean here -- if read < row_count I think that means the array is exhausted and the row group is done.

What sort of fast path would it be?

That's right @alamb , sorry for that i am not describing it correctly, i mean do we need to break early for it. It looks like not needed.

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖: Benchmark completed

🤔 it seems it is less efficient

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (ee714ca) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

Update here is that this is not looking super promising and I am somewhat stuck with how to integrate mask based selections into the logic more cleanly. I need to think about it some more.

I may park this for a while and continue working on filter results caching some more

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.01      2.4±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.05     15.4±0.18ms        ? ?/sec    1.00     14.6±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.02     16.9±0.11ms        ? ?/sec    1.00     16.5±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.04     39.8±0.35ms        ? ?/sec    1.00     38.1±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     53.7±0.44ms        ? ?/sec    1.00     53.4±0.56ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.02     51.4±0.34ms        ? ?/sec    1.00     50.4±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.1±0.04ms        ? ?/sec    1.00      5.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.03    164.3±0.52ms        ? ?/sec    1.00    159.0±1.15ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.13    231.1±1.03ms        ? ?/sec    1.00    204.3±1.42ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.08    513.1±1.80ms        ? ?/sec    1.00    475.6±2.61ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.02    496.8±8.85ms        ? ?/sec    1.00   486.4±11.94ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.02     59.2±0.90ms        ? ?/sec    1.00     58.1±0.73ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.03    167.1±0.70ms        ? ?/sec    1.00    161.8±0.83ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.04    167.4±1.78ms        ? ?/sec    1.00    160.3±0.84ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.04     66.7±0.33ms        ? ?/sec    1.00     64.4±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.04    173.2±0.94ms        ? ?/sec    1.00    166.5±0.90ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.05    104.0±0.72ms        ? ?/sec    1.00     98.8±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.02     39.8±0.30ms        ? ?/sec    1.00     38.9±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.04     50.0±0.87ms        ? ?/sec    1.00     48.0±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     53.9±0.57ms        ? ?/sec    1.00     54.0±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.01     40.5±0.33ms        ? ?/sec    1.00     40.2±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.4±0.09ms        ? ?/sec    1.01     14.5±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.2±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.04     13.7±0.07ms        ? ?/sec    1.00     13.2±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.03     15.6±0.09ms        ? ?/sec    1.00     15.1±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     38.5±0.21ms        ? ?/sec    1.05     40.2±0.44ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     51.1±0.24ms        ? ?/sec    1.07     54.7±0.64ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     50.0±0.34ms        ? ?/sec    1.02     51.1±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.01      4.3±0.03ms        ? ?/sec    1.00      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.03    179.7±0.75ms        ? ?/sec    1.00    174.5±1.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.03    236.0±1.17ms        ? ?/sec    1.00    230.2±2.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.04    492.8±1.95ms        ? ?/sec    1.00    472.5±2.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.04   456.8±15.44ms        ? ?/sec    1.00   440.6±12.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     56.3±0.62ms        ? ?/sec    1.00     56.4±0.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.05    159.2±1.04ms        ? ?/sec    1.00    152.0±0.86ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.04    157.1±1.01ms        ? ?/sec    1.00    151.3±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.04     64.5±0.37ms        ? ?/sec    1.00     62.2±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.03    161.2±1.01ms        ? ?/sec    1.00    156.0±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.05     97.0±0.52ms        ? ?/sec    1.00     92.0±0.62ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.02     32.6±0.25ms        ? ?/sec    1.00     32.1±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.03     35.4±0.27ms        ? ?/sec    1.00     34.5±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     50.6±0.37ms        ? ?/sec    1.01     50.9±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     38.0±0.24ms        ? ?/sec    1.00     37.9±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.7±0.05ms        ? ?/sec    1.01     13.8±0.08ms        ? ?/sec

@zhuqi-lucas
Copy link
Contributor

Update here is that this is not looking super promising and I am somewhat stuck with how to integrate mask based selections into the logic more cleanly. I need to think about it some more.

I may park this for a while and continue working on filter results caching some more

Thank you very much @alamb , i can continue help investigate why the performance has some regression for this PR.

@zhuqi-lucas
Copy link
Contributor

It looks like remove this check, can improve a little performance, but still regression some cases, can't find the root cause until now.

        // Reader should read exactly `batch_size` records except for last batch
        if !end_of_stream && (read_records != batch_size) {
            return Err(general_err!(
                "Internal Error: unexpected read count. Expected {batch_size} got {read_records}"
            ));
        }

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

It looks like remove this check, can improve a little performance, but still regression some cases, can't find the root cause until now.

Thansk @zhuqi-lucas -- I will remove that check

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (00a0f1f) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.01      2.4±0.05ms        ? ?/sec    1.00      2.3±0.00ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.1±0.10ms        ? ?/sec    1.01     14.3±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.4±0.15ms        ? ?/sec    1.00     16.4±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     37.4±0.20ms        ? ?/sec    1.01     37.8±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     50.9±0.38ms        ? ?/sec    1.01     51.6±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     48.5±0.20ms        ? ?/sec    1.01     48.9±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      4.9±0.04ms        ? ?/sec    1.01      5.0±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    157.3±0.67ms        ? ?/sec    1.08    170.2±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    220.4±0.81ms        ? ?/sec    1.00    220.4±1.27ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.01    487.1±1.89ms        ? ?/sec    1.00    482.5±3.29ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.01   488.8±19.49ms        ? ?/sec    1.00    484.6±9.86ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.01     57.1±0.82ms        ? ?/sec    1.00     56.4±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    161.4±0.98ms        ? ?/sec    1.00    160.7±1.14ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    158.9±0.78ms        ? ?/sec    1.01    159.8±1.08ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.02     64.4±0.50ms        ? ?/sec    1.00     62.9±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    165.7±0.96ms        ? ?/sec    1.00    165.3±1.03ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     98.5±0.54ms        ? ?/sec    1.00     98.0±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.4±0.28ms        ? ?/sec    1.01     38.7±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     47.5±0.30ms        ? ?/sec    1.01     48.1±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     53.1±0.42ms        ? ?/sec    1.00     53.2±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     39.9±0.28ms        ? ?/sec    1.00     39.8±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.3±0.09ms        ? ?/sec    1.00     14.3±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.2±0.01ms        ? ?/sec    1.00      2.2±0.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.01     13.1±0.06ms        ? ?/sec    1.00     13.0±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.01     15.0±0.06ms        ? ?/sec    1.00     14.9±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.01     39.5±0.22ms        ? ?/sec    1.00     39.3±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.03     52.7±0.42ms        ? ?/sec    1.00     51.3±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     50.3±0.37ms        ? ?/sec    1.00     50.3±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.03ms        ? ?/sec    1.02      4.3±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    173.9±0.68ms        ? ?/sec    1.00    174.5±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    228.7±3.39ms        ? ?/sec    1.00    227.9±2.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    468.2±1.80ms        ? ?/sec    1.00    468.1±2.34ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.03   438.5±15.37ms        ? ?/sec    1.00   426.0±12.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.02     54.3±0.35ms        ? ?/sec    1.00     53.5±0.56ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    150.2±0.74ms        ? ?/sec    1.00    150.7±0.78ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    148.4±0.85ms        ? ?/sec    1.00    148.9±0.75ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     62.0±0.34ms        ? ?/sec    1.00     61.2±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    154.0±0.89ms        ? ?/sec    1.00    154.5±0.92ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     90.2±0.44ms        ? ?/sec    1.00     90.6±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     31.2±0.25ms        ? ?/sec    1.00     31.3±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.4±0.19ms        ? ?/sec    1.02     33.9±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     49.5±0.28ms        ? ?/sec    1.00     49.3±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.0±0.22ms        ? ?/sec    1.00     37.0±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.01     13.6±0.12ms        ? ?/sec    1.00     13.5±0.05ms        ? ?/sec

/// how many records have been read by RowSelection in the "current" batch
read_records: usize,
/// Input selectors to read from
input_selectors: VecDeque<RowSelector>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Vec can be used here (track an index)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used VecDeque as that is how the current code does it

I can try and see if it makes any difference,.

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (76b1ef6) to 0d774fe diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

@alamb alamb force-pushed the alamb/row_selection_plan branch from 76b1ef6 to fb87e2c Compare May 23, 2025 19:00
@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.00      2.4±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.02     14.5±0.06ms        ? ?/sec    1.00     14.2±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.02     16.3±0.07ms        ? ?/sec    1.00     16.0±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.05     39.2±0.22ms        ? ?/sec    1.00     37.3±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.04     52.5±0.27ms        ? ?/sec    1.00     50.6±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.04     50.6±0.40ms        ? ?/sec    1.00     48.5±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.0±0.19ms        ? ?/sec    1.00      5.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.03    162.5±0.59ms        ? ?/sec    1.00    157.3±0.76ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.04    208.9±0.91ms        ? ?/sec    1.00    201.8±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.07    433.8±1.22ms        ? ?/sec    1.00    407.3±1.45ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.03   496.1±14.49ms        ? ?/sec    1.00    481.9±5.63ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.03     57.6±0.38ms        ? ?/sec    1.00     56.2±0.53ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.03    165.4±1.09ms        ? ?/sec    1.00    160.1±0.91ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.03    162.7±0.90ms        ? ?/sec    1.00    158.4±1.39ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.02     64.8±0.38ms        ? ?/sec    1.00     63.2±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.03    168.6±0.91ms        ? ?/sec    1.00    164.2±0.87ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.06    103.2±0.53ms        ? ?/sec    1.00     97.7±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.02     39.1±0.46ms        ? ?/sec    1.00     38.5±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.03     48.8±0.72ms        ? ?/sec    1.00     47.5±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     52.9±0.59ms        ? ?/sec    1.01     53.1±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     40.1±0.32ms        ? ?/sec    1.00     39.9±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.2±0.10ms        ? ?/sec    1.00     14.3±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.02     13.3±0.06ms        ? ?/sec    1.00     13.0±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.02     15.2±0.07ms        ? ?/sec    1.00     14.9±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.05     41.4±0.26ms        ? ?/sec    1.00     39.4±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.04     53.7±0.31ms        ? ?/sec    1.00     51.6±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.06     52.5±0.43ms        ? ?/sec    1.00     49.6±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.3±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.03    178.4±0.82ms        ? ?/sec    1.00    173.9±0.57ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.04    238.4±1.66ms        ? ?/sec    1.00    229.9±2.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.05    491.4±3.19ms        ? ?/sec    1.00    467.3±2.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    431.0±6.85ms        ? ?/sec    1.01   435.4±16.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.03     54.8±0.52ms        ? ?/sec    1.00     53.2±0.63ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.03    155.4±0.63ms        ? ?/sec    1.00    151.0±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.03    153.0±0.65ms        ? ?/sec    1.00    148.4±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.02     62.4±0.33ms        ? ?/sec    1.00     61.0±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.02    157.7±0.70ms        ? ?/sec    1.00    153.9±0.84ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.06     95.5±0.59ms        ? ?/sec    1.00     90.1±0.66ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.02     31.6±0.27ms        ? ?/sec    1.00     31.0±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.03     34.5±0.26ms        ? ?/sec    1.00     33.5±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.01     49.3±0.18ms        ? ?/sec    1.00     49.0±0.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     36.9±0.12ms        ? ?/sec    1.00     36.8±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.5±0.07ms        ? ?/sec    1.00     13.5±0.11ms        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤔 seems like vec made things slower (see results here)

@alamb alamb force-pushed the alamb/row_selection_plan branch from fb87e2c to 948374a Compare May 23, 2025 19:32
@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/row_selection_plan (948374a) to e9df239 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_row_selection_plan
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 23, 2025

🤖: Benchmark completed

Details

group                                alamb_row_selection_plan               main
-----                                ------------------------               ----
arrow_reader_clickbench/async/Q1     1.00      2.4±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     14.3±0.50ms        ? ?/sec    1.00     14.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     16.1±0.09ms        ? ?/sec    1.00     16.2±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     37.2±0.24ms        ? ?/sec    1.00     37.1±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.01     50.9±0.34ms        ? ?/sec    1.00     50.5±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.01     48.7±0.38ms        ? ?/sec    1.00     48.2±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      4.9±0.03ms        ? ?/sec    1.02      5.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    158.0±0.59ms        ? ?/sec    1.00    157.2±0.79ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    202.5±0.87ms        ? ?/sec    1.07    217.4±1.07ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    405.0±1.25ms        ? ?/sec    1.18    478.0±2.49ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.09    506.7±7.67ms        ? ?/sec    1.00    463.4±4.56ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.01     56.7±0.81ms        ? ?/sec    1.00     56.1±1.00ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    159.5±0.91ms        ? ?/sec    1.01    160.3±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    157.2±0.92ms        ? ?/sec    1.01    158.5±0.78ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.03     65.1±0.78ms        ? ?/sec    1.00     63.2±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    165.1±1.18ms        ? ?/sec    1.00    165.3±0.81ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.01     98.3±0.41ms        ? ?/sec    1.00     97.8±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     38.4±0.27ms        ? ?/sec    1.00     38.5±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.01     48.3±0.26ms        ? ?/sec    1.00     47.7±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     53.0±0.63ms        ? ?/sec    1.01     53.4±0.92ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.01     40.1±0.33ms        ? ?/sec    1.00     39.8±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.01     14.3±0.10ms        ? ?/sec    1.00     14.2±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.2±0.01ms        ? ?/sec    1.00      2.2±0.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     13.1±0.06ms        ? ?/sec    1.00     13.0±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     15.0±0.05ms        ? ?/sec    1.00     14.9±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     39.3±0.27ms        ? ?/sec    1.00     39.1±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.01     52.3±0.32ms        ? ?/sec    1.00     51.8±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     50.6±0.33ms        ? ?/sec    1.00     50.5±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.01ms        ? ?/sec    1.02      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    174.8±0.73ms        ? ?/sec    1.00    174.1±0.77ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    230.1±2.96ms        ? ?/sec    1.00    230.1±1.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    467.1±1.94ms        ? ?/sec    1.01    469.8±2.79ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.01   439.6±12.85ms        ? ?/sec    1.00   434.9±15.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     53.6±0.53ms        ? ?/sec    1.00     53.4±0.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    150.8±0.81ms        ? ?/sec    1.00    150.4±0.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.01    150.1±0.84ms        ? ?/sec    1.00    148.9±0.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     62.2±0.32ms        ? ?/sec    1.00     61.3±0.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    154.1±0.78ms        ? ?/sec    1.00    154.1±0.75ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     90.4±0.59ms        ? ?/sec    1.00     90.5±0.55ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.01     31.1±0.34ms        ? ?/sec    1.00     30.8±0.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.7±0.28ms        ? ?/sec    1.00     33.6±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     49.6±0.26ms        ? ?/sec    1.00     49.4±0.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.2±0.17ms        ? ?/sec    1.00     37.0±0.34ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.01     13.7±0.09ms        ? ?/sec    1.00     13.5±0.03ms        ? ?/sec

@zhuqi-lucas
Copy link
Contributor

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

Thank you @alamb, it's a good news!

@alamb alamb force-pushed the alamb/row_selection_plan branch from 948374a to 72a8114 Compare May 24, 2025 11:44
@alamb
Copy link
Contributor Author

alamb commented May 24, 2025

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

Thank you @alamb, it's a good news!

This PR has shown me that for some queries the dispatch logic for RowSelection is quite high (as in just doing an extra compare in that loop made a measurable difference).

@zhuqi-lucas in your testing, did you measure where the cutoff for using a bitmap vs a RowSelector was? I think I remember seeing a value of 10 somewhere

@zhuqi-lucas
Copy link
Contributor

Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps

Thank you @alamb, it's a good news!

This PR has shown me that for some queries the dispatch logic for RowSelection is quite high (as in just doing an extra compare in that loop made a measurable difference).

@zhuqi-lucas in your testing, did you measure where the cutoff for using a bitmap vs a RowSelector was? I think I remember seeing a value of 10 somewhere

I agree @alamb , i was testing with 10 for cutoff for using a bitmap vs a RowSelector, it's a very basic cutoff:

avg_size_of_selector = total row / selectors

if avg_size_of_selector > 10 using selector

if avg_size_of_selector <= 10 using bitmap

And the default is selector because i use it to compute avg_size_of_selector.

@alamb
Copy link
Contributor Author

alamb commented May 24, 2025

And the default is selector because i use it to compute avg_size_of_selector.

Make sense -- thank you

I found SlicesIterator when looking at the Bitmap --> RowSelection code the other day. I think that could be used to determine the "average run length" so we could continue to use skip/select for large contiguous runs but switch to bitmap when there are smaller

The other thing I couldn't easily work out was if there was any way to switch from select/skip within a output batch, or if the plan needs to be either RowSelector or BitMap for each output batch

Or maybe we could just add a third type of ReadPlan, namely ReadPlan::Bitmap 🤔

@zhuqi-lucas
Copy link
Contributor

And the default is selector because i use it to compute avg_size_of_selector.

Make sense -- thank you

I found SlicesIterator when looking at the Bitmap --> RowSelection code the other day. I think that could be used to determine the "average run length" so we could continue to use skip/select for large contiguous runs but switch to bitmap when there are smaller

The other thing I couldn't easily work out was if there was any way to switch from select/skip within a output batch, or if the plan needs to be either RowSelector or BitMap for each output batch

Or maybe we could just add a third type of ReadPlan, namely ReadPlan::Bitmap 🤔

Thank you @alamb , this is very good point:

  1. I was testing for output batch, we both use either RowSelector or BitMap for each output batch:

Because, it may happen 8192 => bitmap, 8192 => selector, 8192 => bitmap...

We can't use only one to make it optimize.

  1. I think the best optimize way is :
  • We have the basic default window size for adaptive batch size 8192, just like above case we setting bitmap/selector for batch size.
  • But we also support merging window for the same type batch window:

For example, we have a output batch, after selecting 5 batch size:

  1. 8192 => bitmap
  2. 8192 => bitmap
  3. 8192 => selector
  4. 8192 => selector
  5. 8192 => bitmap

We can merge 1, 2 because they are all bitmap.
We can merge 4,5 because they are all selectors.
And remaining one bitmap

But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further.

Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step...

@alamb
Copy link
Contributor Author

alamb commented May 25, 2025

But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further.

This is an interesting idea and I think it is worth explroing

Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step...

👍

Another thing that makes this tricky in my mind is that if batch_size is 8000 that requires the total number of 1s in the mask needs to be 8000 -- the mask itself can be substantially larger (e.g. it could be 16000 and select every other row) 🤔

@zhuqi-lucas
Copy link
Contributor

But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further.

This is an interesting idea and I think it is worth explroing

Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step...

👍

Another thing that makes this tricky in my mind is that if batch_size is 8000 that requires the total number of 1s in the mask needs to be 8000 -- the mask itself can be substantially larger (e.g. it could be 16000 and select every other row) 🤔

Very good point! @alamb It's hard for us to reduce it's overhead, maybe we can setting something like max_bitmap_iterator:

When bitmap iterator hit > max_bitmap_iterator, we can consume it first as a output batch, and then to merge those batch finally. But i am not sure if it will make the performance worse than using selector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants