Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of the connection #4

Open
daviddetorres opened this issue Nov 14, 2019 · 10 comments
Open

Status of the connection #4

daviddetorres opened this issue Nov 14, 2019 · 10 comments

Comments

@daviddetorres
Copy link
Contributor

daviddetorres commented Nov 14, 2019

In some cases it can happen that the TCP connection with the device is ok, but the modbus server is not running, the client is connecting to another open port or that the modbus ID configured is not correct.

It also can happen that in the configuration shown in the README with a modbus TCP/RTU bridge, that the bridge is online (so the connection is established) but the connection with the RTU devices (usually a RS232 or RS485 bus) is not ok (bus disconnected, incorrect serial configuration, etc).

It could be interesting to add a metric to inform about the connection status (connection_up?) if the socket is correctly established, independently of the result of the query of the modbus registers. This can help to detect failures in the devices or configuration issues.

If you think any of this ideas would be worth to work in, I could work in a PR.

@RichiH
Copy link
Owner

RichiH commented Dec 8, 2019

Sounds interesting; the normal pattern would be to dump this on STDOUT/STDERR, but a /metrics endpoint for stats about the exporter itself would be nice and that could carry a counter for failures. That way, operators would know to check the logs.

@daviddetorres
Copy link
Contributor Author

I'll work in a PR to add these error metrics.

Maybe it would be interesting, as you pointed, to give information of the number of failures, even adding a label with the type of failure (connection_error, timeout, incorrect_function?, bad_address?). This way operator would have more information on the type of problem they are addressing.

@RichiH
Copy link
Owner

RichiH commented Dec 27, 2019

Sounds good. If you touch that part, moving the exporter metrics from :0911/metrics to :9602/metrics and the target metrics from :9602/metrics to :9602/modbus/target=1.2.3.4 would be nice.

Basically https://github.com/prometheus/snmp_exporter#usage

@daviddetorres
Copy link
Contributor Author

Seems similar to how the blackbox exporter also works. I'll try to work in a PR these days.

@RichiH
Copy link
Owner

RichiH commented Dec 28, 2019

Yes, we made blackbox and snmp exporters behave the same so we have a bit of a standard already

@daviddetorres
Copy link
Contributor Author

I'm already working in adding information about the error rate and codes to the exporter. I think it would be also important to add information about time and number of the requests, so it is possible to visualize latency, traffic and error rate in a dashboard. Similar to what the blackbox exporter does, but with the modbus queries.

I'll include them in the metrics of the exporter with a label for the target, to be able to visualize these metrics in total and per target.

@RichiH
Copy link
Owner

RichiH commented Dec 29, 2019

Query runtime should be part of the target metric data. That way, you can pin down specific PLCs becoming slower, etc.

@daviddetorres
Copy link
Contributor Author

In the PR #7 I added those metrics with the label "target", so error rates, latency, number of queries, etc. can be treated in total and per target. (I had to deal with specific PLCs with problems and that's why I added the label with the target).

The number of possible values of the label if bounded by the number of modbus TCP devices, and keeping in account that usually modbus IP PLCs act as aggregators of modbus RTU devices, there shouldn't be a great number of different targets. This way, the high cardinality problem should be contained.

For further information about the cause of a specific PLC failure or malfunction, there are logs, but at least, the alarm can be configured to note that something is happening and with information about which target is with problems and what kind of error.

@dssantana-zz
Copy link

@daviddetorres We are having some issues with some connections to the modbus server. some sessions are left as close_wait in the server side. in the same request we are querying around 600 devices, should we split the request?

@dshatokhin
Copy link
Contributor

dshatokhin commented May 6, 2020

Modbus has interesting function code "0x11 - Report Slave ID". You can send request to modbus device and get unit id back, if device is up and running.
In other words - modbus ping.
Unfortunately, I'm only in halfway of my online golang course, so my skill and experience is way beyond useful PR at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants