-
Notifications
You must be signed in to change notification settings - Fork 322
feat: support pagination in list_*
methods in rest catalog
#2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: support pagination in list_*
methods in rest catalog
#2158
Conversation
list_*
methods in rest catalog
pyiceberg/catalog/rest/__init__.py
Outdated
list_tables_response = self.list_tables_raw(namespace=namespace, page_size=page_size, next_page_token=next_page_token) | ||
tables.extend([(*table.namespace, table.name) for table in list_tables_response.identifiers]) | ||
if list_tables_response.next_page_token is None: | ||
break | ||
else: | ||
next_page_token = list_tables_response.next_page_token | ||
|
||
return tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we want to return a special Iterable[Identifier]
that calls the next page when the current page is exhausted. This avoids pulling in all the Identifier
s right away, reducing memory pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, I think I can implement something. That will be a breaking change though right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an iter_*
method for each listable and opted to just call list(iter) in the existing list_* methods to avoid a breaking change -- If you have suggestions/guidance on how to accomplish both without making a breaking change I will happily implement as that would be much better but I didn't see an obvious way to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be partially breaking, but Iterable
is pretty close to List
, so I think the community might be okay with the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is swapping to Iterable
what we want to do then? It makes the rest catalog break the interface as every other catalog returns lists -- so would need to change every other catalog to also return an Iterable
in the type signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overnight, I was thinking that maybe we could subclass List
?
Otherwise, switching to the Iterable
makes the most sense to me. We can also split out the change in a separate PR and send out an email to the devlist to see what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on switching to iterable, subclassing list seems sort of like a lot just to support pagination. I will have a commit that makes all list responses from catalogs iterables later...fighting one issue with click and how it handles the context but other than that all tests are passing with that swap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, all iterators now -- seriously would have been impossible without the test coverage we have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think its even worth making a separate issue with just the iterator change as we will still have to modify the list_* methods in the rest catalog due to that retry decorator to essentially be wrappers around a similar list_raw call. Should I just send this PR to the dev list to get some more eyes on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closes #2084
Rationale for this change
Support pagination!
Are these changes tested?
Added tests
Are there any user-facing changes?
Yes. Adds a new argument of
page_size
to eachlist_*
method on the rest catalog and also adds newlist_*_raw
methods for users to build their own abstractions ontop of in addition to supporting pagination in the existing implementations