Skip to content

Conversation

@marcin-bazyl
Copy link
Collaborator

@marcin-bazyl marcin-bazyl commented Oct 1, 2025

COMPLETES #SPARK-719044

This pull request addresses

Sometimes we can get into a state where we will be flooding Homer with media requests:
it's caused by an optimization in Homer where they send us "live" source state for a remote participant after they receive a media request from us even before they are 100% sure that that participant is actually sending video, so if that participant is actually not sending video, very soon afterwards (within milliseconds) they send us another update with "avatar" source state. Each such update may cause us to send a new media request (because we have different number of live remote participants, so we adjust the maxFs in the requests to not go over the this.degradationPreferences.maxMacroblocksLimit). Our new media requests can trigger Homer again to send us a fake "live", so we get into a loop.

by making the following changes

Added throttling of sendMediaRequests() that's called as a result of Homer notifications:

  • for source update changes
  • for number of sources available

Homer team wanted us to throttle this at 5s, but I've set it to 2s, because 5s seems a bit long. In the logs from the jira Homer was sending us the fake "live" followed by "avatar" within milliseconds, so 2s should be enough in most cases.

There is 1 use case where user can see the delay caused by this throttling:

  • when there are less than 6 other participants and user ticks the "Hide participants without video" option and one remote participant mutes their video - at that point we receive 2 notifications from Homer, first one is the source update, followed by the change of numLiveSources, so because of the throttling, we send new media request after the source update and before numLiveSources change and only send the 2nd media request that results from numLiveSources change after 2s - as a result user will see the avatar of that person for 2s.

Change Type

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Tooling change
  • Internal code refactor

The following scenarios where tested

unit tests

The GAI Coding Policy And Copyright Annotation Best Practices

  • GAI was not used (or, no additional notation is required)
  • Code was generated entirely by GAI
  • GAI was used to create a draft that was subsequently customized or modified
  • Coder created a draft manually that was non-substantively modified by GAI (e.g., refactoring was performed by GAI on manually written code)
  • Tool used for AI assistance (GitHub Copilot / Other - specify)
    • Github Copilot
    • Other - Please Specify
  • This PR is related to
    • Feature
    • Defect fix
    • Tech Debt
    • Automation

I certified that

  • I have read and followed contributing guidelines

  • I discussed changes with code owners prior to submitting this pull request

  • I have not skipped any automated checks

  • All existing and new tests passed

  • I have updated the documentation accordingly


Make sure to have followed the contributing guidelines before submitting.

@marcin-bazyl marcin-bazyl requested review from a team as code owners October 1, 2025 11:48
@marcin-bazyl marcin-bazyl added the validated If the pull request is validated for automation. label Oct 1, 2025
@bbaldino
Copy link

bbaldino commented Oct 1, 2025

I'm not crazy about this change, I don't see why we should be sacrificing web client responsiveness to handle this problem. If the web client gets a JMP message that impacts its layout, we want to respond quickly. I don't think one extra message (because of an optimization attempt on the homer side) seems like a good reason to add a possible delay of 2 seconds.

The loop scenario isn't fully clear to me but if anything it sounds like a bug we should figure out by some other means than the throttling.

@marcin-bazyl
Copy link
Collaborator Author

marcin-bazyl commented Oct 2, 2025

I'm not crazy about this change, I don't see why we should be sacrificing web client responsiveness to handle this problem. If the web client gets a JMP message that impacts its layout, we want to respond quickly. I don't think one extra message (because of an optimization attempt on the homer side) seems like a good reason to add a possible delay of 2 seconds.

The loop scenario isn't fully clear to me but if anything it sounds like a bug we should figure out by some other means than the throttling.

I'm also not a fan of this solution, but it felt like it's going to be hard to convince Homer to change their behavior. I can hold off with this PR for now and we can discuss this further with Homer team after the Chinese holidays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

validated If the pull request is validated for automation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants