-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should crash if configured collectors are missing #57
Comments
We don't have the manpower to babysit bad source collectors and host modifiers, so failing straight up would not be ideal for us. If we are provided with a bad source collector, it shouldn't interfere with the good ones. We discussed this issue today, and want to implement a system wherein source collectors (and modifiers?) are disabled after failing X times in Y seconds. This allows us to integrate potentially unstable source collectors without jeopardizing the overall stability of ZAC. Some source collectors might be more unstable than others; would it make sense for an overall failure tolerance setting, which can be optionally overridden by individual source collectors? Something like this for global configuration: [zac.source_collectors]
failure_tolerance = 5
failure_interval = 300 And this for individual source collector configuration: [source_collectors.mysource]
module_name = "mysource"
update_interval = 60
failure_tolerance = 10
failure_interval = 180 Alternatively: [source_collectors.mysource]
module_name = "mysource"
update_interval = 60
[source_collectors.mysource.failure]
tolerance = 10
interval = 180 These are the names suggested by ChatGPT (I struggled to come up with names myself):
Finally, to support your proposed behavior there could be a configuration option that bypasses this error tolerance strategy entirely (struggling to come up with a good name for this): [zac.source_collectors]
exit_on_failure = true # immediately exit upon encountering a failing source collector |
Now we're not talking about the same issue. Missing collector != failing collector (or failing modifier). Missing collector, this issueA missing collector means that all hosts collected by the collector should be deleted from zabbix (might not really be true as long as this issue persists #11 ). This is quite severe. I think ZAC should crash rather than continue, as stated in this issue (even though the failsafe probably will trigger. There is no need to rely on the safety net when you know you're failing). Failing collectorThis isn't a problem? There is nothing to fix? It will raise an exception:
Log and go into a bad state (which can be monitored): zabbix-auto-config/zabbix_auto_config/processing.py Lines 54 to 59 in 2b45f1c
This means that the current collect run is discarded and ZAC will retry with the normal interval. If the next run is ok the state will also become ok. (I'm pretty sure this happens from now and then at UiO, with e.g. Nivlheim being unavailable, if I remember correct?) Failing modifierThen a host is just not modified: zabbix-auto-config/zabbix_auto_config/processing.py Lines 309 to 311 in 2b45f1c
This could lead to a weird state/unknown problems (very dependent on the modifier, obviously). You can tell I was wondering what to do from the comment. I now think that the current host that is being modified should probably be discarded (kind of like a failing collector), but this is really another issue than what I'm mentioning here. |
Ok, my bad, I can create a separate issue. But don't you think there should consistent behavior both in terms of loading and running external modules? Loading a modifier: zabbix-auto-config/zabbix_auto_config/processing.py Lines 236 to 243 in 2b45f1c
Loading a collector: zabbix-auto-config/zabbix_auto_config/__init__.py Lines 28 to 33 in 2b45f1c
In the case of collectors, we at least check if the import was successful or not. Even if we're fairly sure a module exists when loading it (in the case of host modifiers), it might contain invalid syntax or otherwise be broken. So some sort of unified interface for loading modules would be nice: import importlib
from types import ModuleType
def load_module(name: str) -> ModuleType:
try:
return importlib.import_module(name)
except Exception as e:
if isinstance(e, ModuleNotFoundError):
msg = f"Unable to find module named {name!r}"
else:
msg = f"Unable to import module named {name!r}: {str(e)}"
logger.error(msg)
raise ZACException(msg) from e And then based on the config, callers of this function could decide whether or not a |
I already created #58 . I think collectors and modifiers currently behave differently and probably can't really be looked at like some general external module. One is explicitly stated in the config. The other is "just there". I'm happy to hear suggestions, but I still think this issue is valid: zac shouldn't start if collectors (explicitly stated in the config) are missing. Isn't that last code bit just kind of wrapping the exception? Does it really add anything? |
Not by itself no, but callers of this function would know that they only need to handle ZACException, which is more predictable than all the possible error states from So instead of zabbix-auto-config/zabbix_auto_config/__init__.py Lines 23 to 33 in 2b45f1c
we would do: def get_source_collectors(config: models.Settings) -> List[SourceCollectorDict]:
source_collector_dir = config.zac.source_collector_dir
sys.path.append(source_collector_dir)
source_collectors = [] # type: List[SourceCollectorDict]
for (source_collector_name, source_collector_values) in config.source_collectors.items():
try:
load_module(source_collector_values.module_name)
except ZACException as e:
if config.zac.source_collectors.ignore_missing:
logging.warning(
"Source collector '%s' import failed: '%s'",
source_collector_values.module_name,
str(e),
)
continue
raise
# exceptions not handled by `load_module` will always fail
# ... So yes, it "only" wraps the error, and we could do a bare Anyway, I don't feel too strongly about this, but I think fundamentally importing modifiers and collectors is the same thing; we are doing dynamic imports on runtime, and there should be predictable behavior shared between the two. |
I have no strong opinion when it comes to wrapping in a My only strong opinion in this issue is that zac should fail rather than continue when a collector is missing. It's a severe error. There is a big difference here between collectors and modifiers since only collectors can be missing. Zac doesn't know what modifiers to expect. Maybe this should change in the future. Maybe zac should have configured a list of enabled modifiers. I think that's not a bad idea. |
If you have a configured collector, but the collector is missing, zac will just continue:
zabbix-auto-config/zabbix_auto_config/__init__.py
Lines 29 to 33 in 2b45f1c
This might lead to unexpected behavior and bad results. It's probably better to raise a
ZACException
and crash? Thoughts?The text was updated successfully, but these errors were encountered: