Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[boschshc] Discovery Frequently Fails with ConnectExceptions and SocketTimeoutExceptions #16174

Closed
david-pace opened this issue Jan 2, 2024 · 9 comments · Fixed by #16211
Closed
Assignees
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@david-pace
Copy link
Member

david-pace commented Jan 2, 2024

Expected Behavior

  • The thing discovery should not query the same IP address concurrently in multiple threads
  • The thing discovery should not be executed too often

Current Behavior

When setting the log level to DEBUG for the boschshc binding, we see a lot of errors like the following in the log:

Discovering failed with exception java.net.ConnectException: Connection refused
Discovering failed with exception java.net.SocketTimeoutException: Connect Timeout

The exceptions as such are to be expected, since the discovery tries to call the URL https://[IP]:8443/smarthome/public/information on every device that advertises a service with the mDNS pattern _http._tcp.local.. Since the pattern is very generic, it can be expected that some devices will not accept a connection with that URL on port 8443 and thus the discovery gets a connection refused response or runs into a timeout.

However, the following things are not expected:

  • The discovery queries certain devices concurrently multiple times, running into connection refused or timeouts for the same IP address concurrently

For analysis, I enhanced the logging on my system and I see "blocks" of ConnectExceptions and SocketTimeoutExceptions like this:

00:19:35.434 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.435 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.434 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.440 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.440 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.440 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.444 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.444 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.444 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused

When setting a breakpoint in the debugger, I see that multiple threads try to query the same IP address concurrently. In this case the device is concurrently queried in 9 threads. I don't know if this is relevant but the device with the .48 address is a printer in my case.

  • In general, the discovery is executed quite often, but the interval seems to be random. In a log file I analyzed the exception blocks occurred in the following intervals (in seconds): 15, 3, 20, 60, 20, 20, 60, 25, 15, 15, 15, 7, 50, 10

Issues with the discovery were also reported by our users @xxthunder and @mike-bike in #16002 and #15912.

Possible Solution

  • Find out why so many threads try to query the same IP concurrently
  • Find out why the interval between the discovery executions is so random
  • Improve logging
    • logger.trace("Discovering ipAddress {}", url); is misleading: URL is not an IP address
    • logger.debug("Discovering failed with status code: {}", contentResponse.getStatus()); and logger.debug("Discovering failed with exception {}", e.getMessage());: Discovery instead of Discovering, discuss if "failed" is the right wording here because this is expected for non-SHC devices, discuss log level debug
    • Consider adding the IP adresses to the log messages on debug level (right now the IP can only be seen on trace level, but it can't be correlated with the exceptions because of the concurrency / multiple threads)

Maybe @GerdZanker could take a look at this because he initially implemented the discovery and might know some important details?

Steps to Reproduce (for Bugs)

Prerequisites: at least one device in the network apart from the Bosch Smart Home Controller that advertises a HTTP server (e.g. a printer).

  1. Set the log level of the boschshc binding to DEBUG or TRACE
  2. Start openHAB
  3. Analyze the content of the openhab.log file

Context

Discovery should run regularly in a stable manner and provide appropriate log entries

Your Environment

  • Version used: openHAB 4.1.0
@david-pace david-pace added the bug An unexpected problem or unintended behavior of an add-on label Jan 2, 2024
@david-pace
Copy link
Member Author

david-pace commented Jan 3, 2024

Here is some additional debug information:

I enhanced the logging to print the IP address (which is always the same in the following log entries, an IP of a printer), the thread's toString() representation, the thread ID and the exception. Here are the relevant log entries grouped by concurrency (log entries of different threads happening in the same second are grouped together):

16:28:00.754 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:28:00.754 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:00.754 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:00.757 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:28:00.758 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:00.757 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:00.760 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:00.760 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:00.760 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-2,5,main] (thread ID 93) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-1,5,main] (thread ID 92) failed with exception java.net.ConnectException: Connection refused
16:28:22.686 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-2,5,main] (thread ID 93) failed with exception java.net.ConnectException: Connection refused
16:28:22.686 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-1,5,main] (thread ID 92) failed with exception java.net.ConnectException: Connection refused
16:28:22.691 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-1,5,main] (thread ID 92) failed with exception java.net.ConnectException: Connection refused
16:28:22.691 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-2,5,main] (thread ID 93) failed with exception java.net.ConnectException: Connection refused

16:29:08.864 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.SocketTimeoutException: Connect Timeout
16:29:08.864 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.SocketTimeoutException: Connect Timeout
16:29:08.864 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.SocketTimeoutException: Connect Timeout

16:29:52.350 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:29:52.350 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:29:52.350 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:29:52.587 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:29:52.676 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:29:52.676 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:30:02.349 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:30:02.349 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:30:02.349 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused

16:30:07.597 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:30:07.597 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:30:07.598 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused

16:30:13.358 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:30:13.358 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:30:13.358 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:30:58.889 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:30:58.889 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:30:58.889 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused

16:31:31.391 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:31:31.391 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:31:31.391 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:31:32.178 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:31:32.178 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:31:32.178 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused

16:31:42.396 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:31:42.396 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:31:42.396 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused

Here we can gather some interesting information:

  • The number of total concurrent threads in a block is always a multiple of 3 (3, 6 or 9 threads)
  • The number of unique threads according to toString() and thread IDs is mostly 3
  • The second block is especially interesting. Here we see a mixture of different thread name prefixes (JmDNS pool- and OH-discovery-). Here we see three different threads with the JmDNS pool- prefix and two different threads with the prefix OH-discovery). That's 5 unique threads in this block.
  • The time interval between the blocks seem to be random: approximately 22, 46, 44, 10, 5, 6, 45, 33, 1 and 10 seconds apart.

@jlaur do you know who implemented the discovery infrastructure and could help us to explain some of these observations?

@GerdZanker
Copy link
Contributor

Hey,
I used other simple "_http._tcp.local." mDNS Discovery code as examples for my pull request. To be honest - I not made up my mind regard threading, because the BridgeDiscoveryParticipant instance is called by the openhab framework and the framework is a back box.

It makes no sense to me to do so many discovery tries in a short amount of time - every 30 seconds is already very fast from my point of view. I agree that many requests will put a lot of stress on the smart home controller, because it has not a huge processing power. If necessary the SHC code could "cache" the last discovery result and return the cache value for a couple of discovery calls from the framework.

@mike-bike
Copy link

mike-bike commented Jan 3, 2024 via email

@david-pace
Copy link
Member Author

Possible relation to openhab/openhab-core#3906

I will also test the latest SNAPSHOT state because there was another change in the ThingHandlerService which also affected our boschshc ThingDiscoveryService: openhab/openhab-core#3957

@david-pace
Copy link
Member Author

david-pace commented Jan 4, 2024

The latest main state also shows the same issues.

I found out that openHAB indeed receives quite many events from the javax.jmdns service. All of these events are forwarded to the discovery participants and this is done is individual threads.

I have an experimental implementation running on my system which caches the discovery results using the IP address as key. This seems to work fine so far. In the whole log I only see the Discovery failed with exception java.net.ConnectException: Connection refused four times, which seems adequate. After that the result is cached and no physical connection attempts are made afterwards.

I also removed the participant-specific object bridgeInformation which was shared among all the different threads, so this might also have been a problem.

@GerdZanker do you think this approach is viable in general or do you have a better idea? Are IP addresses ok as keys for the cache? Do we need advanced features like evicting entries from the cache after n minutes or something like that?

@mike-bike
Copy link

mike-bike commented Jan 4, 2024 via email

@GerdZanker
Copy link
Contributor

@GerdZanker do you think this (cache) approach is viable in general or do you have a better idea? Are IP addresses ok as keys for the cache? Do we need advanced features like evicting entries from the cache after n minutes or something like that?

Yes, caching or other way to finish the discovery as early and as cheap as possible is good. Have you evaluate to use the complete ServiceInfo instance as a key? Using the IP address is also fine, because I like the documented trade-off from @mike-bike for a pragmatic approach.

@GerdZanker
Copy link
Contributor

I'm still wonder why we run now into the discovery trouble:

  • The jmdns lib is not updated since two years (as far as I was able to trace down the pom.xmls).
  • In a fast spot check of other MDNSDiscoveryParticipant code, no code uses caching.
  • Many Discovery Participants use an own service type with a unique name for the bridge/thing. It is unclear if we could do the same.
  • I saw several checks of the ServiceInfo instance, e.g. evaluation of getPropertyString(...) or getName() return values, which is currently not done in the Bosch SHC code.

@david-pace
Copy link
Member Author

Thank you for your input 👍 Let's discuss further ideas for enhancements in the pull request I just created.

@david-pace david-pace self-assigned this Jan 5, 2024
@david-pace david-pace added the PR pending There is a pull request for resolving the issue label Jan 7, 2024
@david-pace david-pace removed the PR pending There is a pull request for resolving the issue label Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants