-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mutable attributes being shared between processors and datasets #366
Conversation
backend/abstract/processor.py
Outdated
#: Is this processor running 'within' a preset processor? | ||
is_running_in_preset = False | ||
|
||
#: This will be defined automatically upon loading the processor. There is | ||
#: no need to override manually | ||
filepath = None | ||
|
||
# def __init__(self, logger, job, queue=None, manager=None, modules=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this year for the moment as a comment. It does not seem necessary for the base processor class to have either options
or parameters
as they are updated when needed.
options
in particular is always retrieved by the get_options
method which correctly returns a blank dictionary if the child class does not define options
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, most of the set-up for BasicProcessor
happens in work()
anyway since it is always instantiated in a separate thread. So insofar as these properties need to be set up that is where it would happen and it is also where self.parameters
is set.
To remind myself why this matters:
Output:
|
LGTM once the conflicts are resolved (took a look myself but I'd prefer if you do it @dale-wahl, just so I don't make the wrong choice of what attributes to declare) |
Merged and tested with no noted ill effects. Those can be discovered later 😁 |
Sometimes
options
and/orparameters
appear to be updated. I have yet to be able to consistently reproduce some of these errors, but can find examples.This count posts analysis for example somehow picked up the
add_relative
option and had it set toTrue
. That should never happen to a Twitter dataset. Unfortunately, I cannot recreate the error (though somehow a similar processor ran withadd_relative
asFalse
, even though that option should not even appear to select).I thought it was connect to this issue. But that does not seem to be the case as I can recreate that environment and there are still random columns appearing. My best guess there is now that something in the frontend jinja is holding data from a previous
dataset
oroptions
that appears whenget_columns
fails.This PR can be merged (at this time), but I have not figured out how to test that it actually solves the problem.