Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mutable attributes being shared between processors and datasets #366

Merged
merged 5 commits into from
Oct 12, 2023

Conversation

dale-wahl
Copy link
Member

@dale-wahl dale-wahl commented May 30, 2023

Sometimes options and/or parameters appear to be updated. I have yet to be able to consistently reproduce some of these errors, but can find examples.

This count posts analysis for example somehow picked up the add_relative option and had it set to True. That should never happen to a Twitter dataset. Unfortunately, I cannot recreate the error (though somehow a similar processor ran with add_relative as False, even though that option should not even appear to select).

I thought it was connect to this issue. But that does not seem to be the case as I can recreate that environment and there are still random columns appearing. My best guess there is now that something in the frontend jinja is holding data from a previous dataset or options that appears when get_columns fails.

This PR can be merged (at this time), but I have not figured out how to test that it actually solves the problem.

#: Is this processor running 'within' a preset processor?
is_running_in_preset = False

#: This will be defined automatically upon loading the processor. There is
#: no need to override manually
filepath = None

# def __init__(self, logger, job, queue=None, manager=None, modules=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this year for the moment as a comment. It does not seem necessary for the base processor class to have either options or parameters as they are updated when needed.

options in particular is always retrieved by the get_options method which correctly returns a blank dictionary if the child class does not define options itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, most of the set-up for BasicProcessor happens in work() anyway since it is always instantiated in a separate thread. So insofar as these properties need to be set up that is where it would happen and it is also where self.parameters is set.

@dale-wahl
Copy link
Member Author

dale-wahl commented May 30, 2023

To remind myself why this matters:

class BigClass():
    parameters = {"default": "Big"}
print(f"Original BigClass parameters: {BigClass.parameters}\n")

# Sub Class inherits from BigClass
class SubA(BigClass):
    pass
# Instantiate this Sub Class and update the dictionary
sub_a = SubA()
sub_a.parameters.update({"test": "chaos!"})
print("sub_a object instantiated and sub_a.parameters.update({\"test\": \"chaos!\"}) called")
print(f"SubA 'update' parameters points to BigClass parameters: {sub_a.parameters}")
print(f"Modified BigClass parameters changed by SubA 'update' method: {BigClass.parameters}\n")

# Now reassign sub_a parameters to new dictionary
sub_a.parameters = {"completely_new": "SubA"}
print("sub_a assignment via sub_a.parameters = {\"completely_new\": \"SubA\"}")
print(f"SubA parameters are assigned not updated: {sub_a.parameters}")
print(f"No change to BigClass parameters: {BigClass.parameters}")
print("sub_a and BigClass parameters have been decoupled")

Output:

Original BigClass parameters: {'default': 'Big'}

sub_a object instantiated and sub_a.parameters.update({"test": "chaos!"}) called
SubA 'update' parameters points to BigClass parameters: {'default': 'Big', 'test': 'chaos!'}
Modified BigClass parameters changed by SubA 'update' method: {'default': 'Big', 'test': 'chaos!'}

sub_a assignment via sub_a.parameters = {"completely_new": "SubA"}
SubA parameters are assigned not updated: {'completely_new': 'SubA'}
No change to BigClass parameters: {'default': 'Big', 'test': 'chaos!'}
sub_a and BigClass parameters have been decoupled

@stijn-uva stijn-uva marked this pull request as ready for review September 21, 2023 14:18
@stijn-uva
Copy link
Member

LGTM once the conflicts are resolved (took a look myself but I'd prefer if you do it @dale-wahl, just so I don't make the wrong choice of what attributes to declare)

@dale-wahl
Copy link
Member Author

Merged and tested with no noted ill effects.

Those can be discovered later 😁

@dale-wahl dale-wahl merged commit 2ef3027 into master Oct 12, 2023
@dale-wahl dale-wahl deleted the mutable_attributes branch October 12, 2023 14:28
@stijn-uva stijn-uva added this to the 1.37 milestone Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants