-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dnssd-plat] integrate DnssdPlatform into Application #2677
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2677 +/- ##
===========================================
- Coverage 55.77% 43.30% -12.48%
===========================================
Files 87 108 +21
Lines 6890 13403 +6513
Branches 0 964 +964
===========================================
+ Hits 3843 5804 +1961
- Misses 3047 7292 +4245
- Partials 0 307 +307 ☔ View full report in Codecov by Sentry. |
a628b1b
to
6554a0d
Compare
d42b3c2
to
6e525fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
c6fcb81
to
59450f9
Compare
The CI failure should be solved when #2702 is meregd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks. 👍
23ef78c
to
ac87936
Compare
Met CI failures in |
ccf2a2a
to
2822a59
Compare
@@ -140,6 +144,9 @@ otbrError Application::Run(void) | |||
// allow quitting elegantly | |||
signal(SIGTERM, HandleSignal); | |||
|
|||
// avoid exiting on SIGPIPE | |||
signal(SIGPIPE, SIG_IGN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed SIGPIPE
signal was triggred and it made the program exit when calling mDNSResponder API. Not sure if ignoring the signal is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to understand why/how this is being signaled. Ignoring it may cause more harm later on, especially if it is happening due to some module not being ready or set up.
It seems to originate from us calling the mDNSResponder APIs?
My guess is that maybe we call an API somehow when the underlying mDNSResponder is not yet ready (too early)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It got SIGPIPE at this line:
ot-br-posix/src/mdns/mdns_mdnssd.cpp
Line 748 in 3f28d4c
dnsError = DNSServiceRemoveRecord(serviceRef, mRecordRef, /* flags */ 0); |
DNSServiceRemoveRecord
. At that time the underlying mDNSResponder wasn't ready.
IIUC mDNSResponder supports turning the SIGPIPE off on OSX: https://github.com/apple-oss-distributions/mDNSResponder/blob/71e6611203d57c78b26fd505d98cb57a33d00880/mDNSShared/dnssd_clientstub.c#L839. So I think it should be fine for us to ignore it? Though the scope is different (per process vs per socket).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it would be better to investigate and understand this better.
We have State
in Publisher
to track when the underlying mDNS is ready or not, and there are explicit checks before all method calls, like this:
if (mState!= State::kReady)
{
error = OTBR_ERROR_INVALID_STATE;
std::move(aCallback)(error);
ExitNow();
}
So why do we get to this part of the code where it may not be ready? Do we not detect that State
has changed and it is not ready? Or are there some other missing state checks somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario is
mdnsd
gets restarted.- the
Publisher
noticed this viakDNSServiceErr_ServiceNotRunning
error code, then it tries to restart:ot-br-posix/src/mdns/mdns_mdnssd.cpp
Line 391 in e2f3c1f
Stop(kStopOnServiceNotRunningError); - In
Stop()
, it wants to remove all registrations.ot-br-posix/src/mdns/mdns_mdnssd.cpp
Line 260 in e2f3c1f
mServiceRegistrations.clear(); - In the destructor of key registration, it unregisters itself by calling
DNSServiceRemoveRecord
.ot-br-posix/src/mdns/mdns_mdnssd.cpp
Line 748 in e2f3c1f
dnsError = DNSServiceRemoveRecord(serviceRef, mRecordRef, /* flags */ 0);
Per your idea I think we should add/move such checks into every registration type's Register
or Unregister
method. I can send a PR later.
However, I think this cannot fully solve the issue. The restart of mdnsd
could happen at any time. When it restarts right before the DNSServiceRemoveRecord
call, it will cause the SIGPIPE before Publisher
handles the state change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details. I think we should also change order in Stop()
and set mState
first before the list/entry deallocation calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent #2725 to apply the suggested order of actions in Stop()
.
2822a59
to
4b2a284
Compare
4b2a284
to
b4d5157
Compare
No description provided.