Skip to content
Discussion options

You must be logged in to vote

Add a feature to inject the row group a row came from when reading a file (still need to read it back, requires changes to DataFusion, but at least would be simple on my end)

One think you could do is to create a ParquetAccessPlan that specifies reading only the row group you are interested in.

You would have to run this for each row group of course, but if you have all the data and metadata in memory anyways, it probably isn't that bad (and you could scan them all in parallel) 🤔

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@adriangb
Comment options

adriangb Sep 25, 2024
Collaborator Author

@alamb
Comment options

alamb Sep 26, 2024
Collaborator

@adriangb
Comment options

adriangb Sep 26, 2024
Collaborator Author

@alamb
Comment options

alamb Sep 26, 2024
Collaborator

Answer selected by Jefffrey
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants