Skip to content

Add MemoryReservation to batch splitting in joins  #13003

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Follow on to #12969 and #12633

In #12633 @mhilton noted that joins sometimes generate giant record batches which causes issues. @alihan-synnada fixed this in #12969 but internally sometimes the joins still generate giant output batches.

As @mhilton says in #12969 (comment)

Unfortunately this doesn't address the actual problem with creating giant batches, which is they require a lot of memory and that memory isn't accounted for in any MemoryPool. Wiring a MemoryReservation into BatchSplitter would probably be enough to address this though.

Describe the solution you'd like

I would like the memory accounting to take into account the large output batch

Describe alternatives you've considered

Wiring a MemoryReservation into BatchSplitter would probably be enough to address

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions