Skip to content

Scan Delete Support Part 4: Delete File Loading; Skeleton for Processing #982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

sdd
Copy link
Contributor

@sdd sdd commented Feb 21, 2025

Extends the DeleteFileManager introduced in #950 To include loading of delete files, storage and retrieval of parsed delete files from shared state, and the outline for how parsing will connect up to this new work.

Issue: #630

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 5 times, most recently from edb1d27 to 8e90bdd Compare February 23, 2025 14:55
@sdd sdd marked this pull request as ready for review February 26, 2025 09:20
@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 4 times, most recently from ec8e7c1 to 06f0df5 Compare March 5, 2025 19:53
Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, will look at the parsed records next.

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 6 times, most recently from 5530bc3 to e997fc6 Compare March 31, 2025 17:27
@sdd
Copy link
Contributor Author

sdd commented Apr 3, 2025

@liurenjie1024, @Xuanwo, @Fokko - this is ready for re-review, if you could take a look that would be great!

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch from e997fc6 to 056e73f Compare April 3, 2025 07:28
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdd for this pr. There are some missing points in current design. Also I would suggest not putting too much in DeleteFilterManager. I suppose DeleterFilterManager acting more like a delete loader, which manages the io and caching of record batch. The actual filtering part, could delegate to DeleteFilter, WDYT? I think a good reference implementation is java's DeleteFilter, see https://github.com/apache/iceberg/blob/af8e3f5a40f4f36bbe1d868146749e2341471586/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L50

@sdd
Copy link
Contributor Author

sdd commented Apr 14, 2025

Thanks for the review @liurenjie1024 - much appreciated. Will come back with a revised design.

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 2 times, most recently from bd33aa5 to 39a26ab Compare April 17, 2025 06:39
@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 3 times, most recently from 5739a46 to 52cf8b9 Compare April 23, 2025 21:07
/// as per the other delete file types - only this time it is accompanied by a one-shot
/// channel sender that we will eventually use to resolve the shared future that we stored
/// in the state.
/// * When this gets updated to add support for delete vectors, the load phase will return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to the puffin / deletion vector support!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too! 😁

liurenjie1024
liurenjie1024 previously approved these changes May 16, 2025
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdd for this pr!

@liurenjie1024
Copy link
Contributor

Let's wait for a moment to merge it after 0.5.0 release

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 3 times, most recently from 078bba7 to fc696ef Compare May 17, 2025 20:31
@sdd sdd force-pushed the feat/delete-fila-manager-loading branch 6 times, most recently from 20b44ab to acd7ab8 Compare May 22, 2025 05:39
@sdd
Copy link
Contributor Author

sdd commented May 23, 2025

Hi @liurenjie1024 / @Xuanwo / @xxchan.

This is now ready again for review after a refactor taking into account @xxchan's great feedback. I'll be on holiday for a week after today so it would be great if you guys could take a look. Thanks!

@sdd sdd force-pushed the feat/delete-fila-manager-loading branch from acd7ab8 to b147098 Compare May 23, 2025 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants