Skip to content

Improve Spill Performance: mmap the spill files #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zebsme
Copy link
Owner

@zebsme zebsme commented Apr 8, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@zebsme
Copy link
Owner Author

zebsme commented Apr 8, 2025

This draft can reproduce the following errors:

FAIL [   0.054s] datafusion::core_integration memory_limit::test_disk_spill_limit_not_reached
FAIL [  21.391s] datafusion::core_integration memory_limit::test_in_mem_buffer_almost_full
FAIL [  48.291s] datafusion::core_integration memory_limit::test_stringview_external_sort
FAIL [   0.334s] datafusion::fuzz fuzz_cases::aggregate_fuzz::test_single_mode_aggregate_with_spill
FAIL [   0.018s] datafusion-physical-plan aggregates::tests::test_aggregate_with_spill_if_necessary
FAIL [   0.090s] datafusion-physical-plan sorts::sort::tests::test_sort_spill_utf8_strings 

These failures share similar output:

──── STDERR:             datafusion::core_integration memory_limit::test_in_mem_buffer_almost_full

thread 'memory_limit::test_in_mem_buffer_almost_full' panicked at datafusion/core/tests/memory_limit/mod.rs:496:32:
called `Result::unwrap()` on an `Err` value: ResourcesExhausted("Failed to allocate additional 3806586 bytes for ExternalSorterMerge[0] with 0 bytes already allocated for this reservation - 2872588 bytes remain available for the total pool")

@zebsme
Copy link
Owner Author

zebsme commented Apr 8, 2025

Apply these code changes to make all tests pass:

// >>> REMOVE THIS BLOCK <<<
// -------------------------------------------
let file = File::open(spill_file.path())?;
let mmap = unsafe { Mmap::map(&file)? };
let bytes = bytes::Bytes::from_owner(mmap);
let buffer = Buffer::from(bytes);
let decoder = IPCBufferDecoder::new(buffer);
// -------------------------------------------
// >>> UNCOMMENT BELOW <<<
// let decoder = IPCBufferDecoder::new(spill_file.path());

// >>> REMOVE THIS BLOCK <<<
// -------------------------------------------------

// // >>> UNCOMMENT BELOW <<<
// // -------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant