Skip to content

Refactor: improve query spill #16605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 4 tasks
Dousir9 opened this issue Oct 14, 2024 · 6 comments
Open
1 of 4 tasks

Refactor: improve query spill #16605

Dousir9 opened this issue Oct 14, 2024 · 6 comments

Comments

@Dousir9
Copy link
Contributor

Dousir9 commented Oct 14, 2024

Summary

@inviscid
Copy link

While the throughput speed to object storage is quite fast, the I/O is not nearly as good as local SSD/NVMe. For spill operations, if they are I/O bound, could we spill to the local cache disk rather than object storage?

We also want to make sure all spill and temp files are cleaned up after the query operation completes or fails. There might be evidence that these temp/spill files persist.

@wubx
Copy link
Member

wubx commented Feb 5, 2025

Yes, latest version support spill to local disk

Add config to config file:

[spill]
spill_local_disk_path = "/data1/databend/databend_spill"

@soyeric128
Copy link
Collaborator

Yes, latest version support spill to local disk

Add config to config file:

[spill]
spill_local_disk_path = "/data1/databend/databend_spill"

Doc updated: https://docs.databend.com/guides/data-management/data-recycle#spill-data-storage

@rad-pat
Copy link

rad-pat commented Mar 4, 2025

Yes, latest version support spill to local disk

@wubx, this is only for window functions so far, right? The biggest spills we have are aggregation so this won't help yet, is that correct?

@BohuTANG
Copy link
Member

BohuTANG commented Mar 5, 2025

Hi @inviscid,

Spill to local disk already supportted, see the doc: https://docs.databend.com/guides/data-management/data-recycle#configuring-spill-storage

@wubx
Copy link
Member

wubx commented Mar 5, 2025

Yes, latest version support spill to local disk

@wubx, this is only for window functions so far, right? The biggest spills we have are aggregation so this won't help yet, is that correct?

#17550
There will be a big improvement after this PR merge.

Databend hope that precisely control the memory occupied by each Query, and then queue the data spill, so that for biggest spills also can spill out to s3.

Please feel free to share any suggestions you may have. We would love to hear your thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants