-
Notifications
You must be signed in to change notification settings - Fork 330
Fix TableScan.update
to exclude cached properties
#2178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TableScan.update
to exclude cached properties
#2178
Conversation
pyiceberg/table/__init__.py
Outdated
return type(self)(**{**self.__dict__, **overrides}) | ||
data = {**self.__dict__, **overrides} | ||
|
||
# Cached properties are also stored in the __dict__, so must be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that normal methods annotated with property
(not cached_property
) aren't stored in dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the nicest solution, though it feels like a minimally viable one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2178 (comment), I'm no longer convinced
you could add |
Yes, I think you're right. I was a bit hesitant to change the constructor, and also technically subclasses of it would need to take |
Maybe its fine to duplicate the update method in the subclasses and leave a comment about why its there for the cached properties? |
Yeah I think maybe we should just make a larger change here and having subclasses implementing updates sounds good. Will need to think about how to go about that though I realised that the current PR (that removes only cached properties) doesn't guard against the case where a subclass has a private member. |
interesting! good catch. I think implicitly the what if we just filter for only the constructor params? wdyt?
|
Yes, I thought about this and think it's a good idea! We achieve something similar to that too with @jayceslesar's Was also toying with a polymorphic solution (#2199) that doesn't assume that |
After some thought, we can maybe narrow down to a few approaches:
@kevinjqliu @jayceslesar, happy to take suggestions / hear thoughts! The first three work; I'm leaning towards (1) currently. |
params = signature(type(self).__init__).parameters.keys() - {"self"} # Skip "self" parameter | ||
kwargs = {param: getattr(self, param) for param in params} # Assume parameters are attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinjqliu, I wrote things slightly differently to #2178 (comment), LMKWYT of this.
Preferred getattr
over self.__dict__
since it feels less low-level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! this is great. lets add some comments to explain why we're doing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! 776c433
Thanks for the thoughtful explanation @jayceslesar I prefer option 1 too. We're making the intent of Option 2 introduces some maintenance burden; we'd have to remember updating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im in favor of option 1, the current approach
Looks like theres a merge conflict too
params = signature(type(self).__init__).parameters.keys() - {"self"} # Skip "self" parameter | ||
kwargs = {param: getattr(self, param) for param in params} # Assume parameters are attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! this is great. lets add some comments to explain why we're doing this
@kevinjqliu, merged and added comment! Are we good to merge? |
TableScan.update
to exclude cached properties
Thanks for adding a fix this bug @smaheshwar-pltr. And thanks for the great discussion and review @jayceslesar |
Rationale for this change
Closes #2179.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, the scenario shown in the test and described in the issue now works.