Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dnssd-plat] integrate DnssdPlatform into Application #2677

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

superwhd
Copy link
Contributor

No description provided.

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 43.30%. Comparing base (2b41187) to head (b4d5157).
Report is 947 commits behind head on main.

Files with missing lines Patch % Lines
src/host/posix/dnssd.cpp 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2677       +/-   ##
===========================================
- Coverage   55.77%   43.30%   -12.48%     
===========================================
  Files          87      108       +21     
  Lines        6890    13403     +6513     
  Branches        0      964      +964     
===========================================
+ Hits         3843     5804     +1961     
- Misses       3047     7292     +4245     
- Partials        0      307      +307     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@superwhd superwhd force-pushed the otplatdnssd branch 2 times, most recently from a628b1b to 6554a0d Compare February 7, 2025 02:29
@superwhd superwhd marked this pull request as ready for review February 7, 2025 02:55
@superwhd superwhd changed the title [Don't review] [dnssd-plat] integrate DnssdPlatform into Application [dnssd-plat] integrate DnssdPlatform into Application Feb 7, 2025
@superwhd superwhd force-pushed the otplatdnssd branch 2 times, most recently from d42b3c2 to 6e525fa Compare February 7, 2025 04:46
@superwhd superwhd closed this Feb 7, 2025
@superwhd superwhd reopened this Feb 7, 2025
@superwhd superwhd requested review from abtink and Irving-cl February 7, 2025 06:41
Copy link
Contributor

@Irving-cl Irving-cl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@superwhd superwhd force-pushed the otplatdnssd branch 2 times, most recently from c6fcb81 to 59450f9 Compare February 8, 2025 01:45
@superwhd
Copy link
Contributor Author

superwhd commented Feb 8, 2025

The CI failure should be solved when #2702 is meregd.

Copy link
Member

@abtink abtink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. 👍

@superwhd superwhd force-pushed the otplatdnssd branch 3 times, most recently from 23ef78c to ac87936 Compare February 11, 2025 04:54
@superwhd
Copy link
Contributor Author

superwhd commented Feb 11, 2025

Met CI failures in test_mdns_restart.py which I couldn't reproduce locally. Debugging...

@superwhd superwhd force-pushed the otplatdnssd branch 2 times, most recently from ccf2a2a to 2822a59 Compare February 12, 2025 12:19
@@ -140,6 +144,9 @@ otbrError Application::Run(void)
// allow quitting elegantly
signal(SIGTERM, HandleSignal);

// avoid exiting on SIGPIPE
signal(SIGPIPE, SIG_IGN);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed SIGPIPE signal was triggred and it made the program exit when calling mDNSResponder API. Not sure if ignoring the signal is fine.

@abtink

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to understand why/how this is being signaled. Ignoring it may cause more harm later on, especially if it is happening due to some module not being ready or set up.

It seems to originate from us calling the mDNSResponder APIs?
My guess is that maybe we call an API somehow when the underlying mDNSResponder is not yet ready (too early)?

Copy link
Contributor Author

@superwhd superwhd Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It got SIGPIPE at this line:

dnsError = DNSServiceRemoveRecord(serviceRef, mRecordRef, /* flags */ 0);
by calling DNSServiceRemoveRecord. At that time the underlying mDNSResponder wasn't ready.

IIUC mDNSResponder supports turning the SIGPIPE off on OSX: https://github.com/apple-oss-distributions/mDNSResponder/blob/71e6611203d57c78b26fd505d98cb57a33d00880/mDNSShared/dnssd_clientstub.c#L839. So I think it should be fine for us to ignore it? Though the scope is different (per process vs per socket).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it would be better to investigate and understand this better.

We have State in Publisher to track when the underlying mDNS is ready or not, and there are explicit checks before all method calls, like this:

    if (mState!= State::kReady)
    {
        error = OTBR_ERROR_INVALID_STATE;
        std::move(aCallback)(error);
        ExitNow();
    }

So why do we get to this part of the code where it may not be ready? Do we not detect that State has changed and it is not ready? Or are there some other missing state checks somewhere?

Copy link
Contributor Author

@superwhd superwhd Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario is

  1. mdnsd gets restarted.
  2. the Publisher noticed this via kDNSServiceErr_ServiceNotRunning error code, then it tries to restart:
    Stop(kStopOnServiceNotRunningError);
  3. In Stop(), it wants to remove all registrations.
    mServiceRegistrations.clear();
  4. In the destructor of key registration, it unregisters itself by calling DNSServiceRemoveRecord.
    dnsError = DNSServiceRemoveRecord(serviceRef, mRecordRef, /* flags */ 0);

Per your idea I think we should add/move such checks into every registration type's Register or Unregister method. I can send a PR later.

However, I think this cannot fully solve the issue. The restart of mdnsd could happen at any time. When it restarts right before the DNSServiceRemoveRecord call, it will cause the SIGPIPE before Publisher handles the state change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details. I think we should also change order in Stop() and set mState first before the list/entry deallocation calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent #2725 to apply the suggested order of actions in Stop().

@superwhd superwhd requested a review from abtink February 14, 2025 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants