Skip to content

Draft: Use take-in kernel in repartitioning #15392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ctsk
Copy link
Contributor

@ctsk ctsk commented Mar 24, 2025

Combined with apache/arrow-rs#7325, tries to use the take_in kernel in repartitioning. The goal is to elide the coalesce step after repartitioning.

@alamb
Copy link
Contributor

alamb commented Mar 25, 2025

Hi @ctsk -- is this PR ready for running some benchmarks?

@ctsk
Copy link
Contributor Author

ctsk commented Mar 26, 2025

@alamb This PR should be able to run benchmarks now. I've added overrides to use the modified version of arrow in the PR and a lockfile to avoid chrono issues. At least it can run tpch :)

@alamb
Copy link
Contributor

alamb commented Mar 27, 2025

I am firing up the benchmarks

@alamb
Copy link
Contributor

alamb commented Mar 27, 2025

I tried to run the clickbench queries using bench.sh and I got an error like this:

Q1: SELECT COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserCountry"), COUNT(DISTINCT "BrowserLanguage")  FROM hits;
Query 1 iteration 0 took 760.6 ms and returned 1 rows
Query 1 iteration 1 took 785.4 ms and returned 1 rows
Query 1 iteration 2 took 787.9 ms and returned 1 rows
Query 1 iteration 3 took 775.5 ms and returned 1 rows
Query 1 iteration 4 took 786.6 ms and returned 1 rows
Q2: SELECT "BrowserCountry",  COUNT(DISTINCT "SocialNetwork"), COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserLanguage"), COUNT(DISTINCT "Soci\
alAction") FROM hits GROUP BY 1 ORDER BY 2 DESC LIMIT 10;

thread 'tokio-runtime-worker' panicked at /home/alamb/.cargo/git/checkouts/arrow-rs-583cca34693b79b8/368c1e6/arrow-array/src/builder/mod.rs:509:35\
:
not yet implemented
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: Context("Join Error", External(JoinError::Panic(Id(2790), "not yet implemented", ...)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants