Skip to content

Conversation

@venky-mediboina
Copy link
Contributor

@venky-mediboina venky-mediboina commented Oct 16, 2025

COMPLETES #SPARK-610518

This pull request addresses

Saving the subnet ip details which includes subnet IPs, latency, etc..

by making the following changes

Saved the subnet ip details in reachability results and sending to web-client, implements each UDP URL checks for Reachability.

Change Type

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Tooling change
  • Internal code refactor

The following scenarios were tested

< ENUMERATE TESTS PERFORMED, WHETHER MANUAL OR AUTOMATED >

The GAI Coding Policy And Copyright Annotation Best Practices

  • GAI was not used (or, no additional notation is required)
  • Code was generated entirely by GAI
  • GAI was used to create a draft that was subsequently customized or modified
  • Coder created a draft manually that was non-substantively modified by GAI (e.g., refactoring was performed by GAI on manually written code)
  • Tool used for AI assistance (GitHub Copilot / Other - specify)
    • Github Copilot
    • Other - Please Specify
  • This PR is related to
    • Feature
    • Defect fix
    • Tech Debt
    • Automation

I certified that

  • I have read and followed contributing guidelines
  • I discussed changes with code owners prior to submitting this pull request
  • I have not skipped any automated checks
  • All existing and new tests passed
  • I have updated the documentation accordingly

Make sure to have followed the contributing guidelines before submitting.

@venky-mediboina venky-mediboina requested review from a team as code owners October 16, 2025 08:34
@aws-amplify-us-east-2
Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-4542.d3m3l2kee0btzx.amplifyapp.com

@venky-mediboina
Copy link
Contributor Author

Added extra fields in the Reachability Results like details (includes subnet's IP, latency, answered-tx, lost-tx, port) , minLatency, and other fields for each protocol in cluster.

Implemented Per URL check for UDP, triggering based on the flag(reachabilityEnablePerUrlForUdp) in config.ts. Took a separate class (SubnetReachabilityChecker) and used that class for every UDP URL from CLusterReachability class.

Sample Results when reachabilityEnablePerUrlForUdp: true:
Reachability Results(when flag true).json

Sample Results when reachabilityEnablePerUrlForUdp: false:
Reachability Results(when flag false).json

@marcin-bazyl marcin-bazyl added the validated If the pull request is validated for automation. label Oct 17, 2025
output.latencyInMilliseconds = value.toString();
// Only include latency fields if the protocol is actually reachable
// Don't include them for unreachable/untested protocols
if (transportResult.result === 'reachable' && typeof value === 'number' && value > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we adding these checks here?

udp: this.mapTransportResultToBackendDataFormat(clusterResult.udp || {result: 'untested'}),
tcp: this.mapTransportResultToBackendDataFormat(clusterResult.tcp || {result: 'untested'}),
udp: this.mapTransportResultToBackendDataFormat(
clusterResult.udp || {result: 'untested', details: []}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need the details: []? the checks in line 458 will handle undefined, right?

* @returns {Promise<void>}
*/
private async storeResults(results: ReachabilityResults) {
// Cleaning up empty details arrays before storing - in case if we get sometimes all domain names, so can't prefill domain names
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it important to remove empty details arrays here? it seems like a lot of code and processing for not much benefit, The empty details are skipped anyway when results are used (in line 458), so do we really need to do this here too?

const cluster = clusterList[key];

this.clusterReachability[key] = new ClusterReachability(key, cluster);
// @ts-ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this ts-ignore here?


this.clusterReachability[key] = new ClusterReachability(key, cluster);
// @ts-ignore
const reachabilityEnablePerUrlForUdp =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: why an extra const if this config is used only in a single place?


try {
// Check IPv4 first (most common)
if (Address4.isValid(host) || Address4.isValid(cleanHost)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to allow ipv4 in square brackets? I thought only ipv6 can be in brackets

backendIpv6NativeSupport: false,
enableReachabilityChecks: true,
reachabilityGetClusterTimeout: 5000,
reachabilityEnablePerUrlForUdp: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's hard to come up with a good name for this, but this name doesn't sound grammatically correct, maybe enableReachabilityCheckPerUdpUrl would be better?

private emitResultReadyEvent(protocol: 'udp' | 'tcp' | 'xtls', functionName: string): void {
const result = this.result[protocol];

if (result.result === 'unreachable' && result.details.length > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have 2 almost identical blocks 233-247 and 249-265?

this.result.xtls = {
result: this.numXTlsUrls > 0 ? 'unreachable' : 'untested',
};
this.result.udp.result = this.numUdpUrls > 0 ? 'unreachable' : 'untested';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why has this initialisation been changed here? this PR is big enough and complicated enough without extra unnecessary changes like this. The previous code was ensuring that latencyInMilliseconds and clientMediaIPs are set to undefined whenever we do start(), the new code doesn't do that. Also, I think I've made this comment about these changes on the previous PR.

/**
* Class to handle individual URL reachability checking for any protocol
*/
class SubnetReachabilityChecker {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this class is a good start, but this is not enough - there is still a lot of duplication between this class and ClusterReachability. First, we need to rename this class to be ReachabilityPeerConnection and this new class needs to handle everything to do with RTCPeerConnection, ice candidates, offer creation etc, - it needs to do it for all protocols (udp, tcp, tls) - it needs to take the arrays of urls for all protocols as input parameters. Basically you need to move all that existing code from ClusterReachability into it, not copy, but move. Once you have that new class ReachabilityPeerConnection, then you need to change ClusterReachability to use it in 1 of these 2 ways:
if we're doing per url udp checks:

  • create 1 instance of ReachabilityPeerConnection for each udp url AND create 1 instance of ReachabilityPeerConnection for all the tcp and tls urls together
    else:
  • create just 1 single instance of ReachabilityPeerConnection for all udp, tcp and tls urls.

Furthermore, looking at this existing PR I'm beginning to realise that it's really too big and too complicated to be properly reviewed. Reachability is a very important part of our join meeting flow, so we cannot risk breaking it, so these changes need to be split into small manageable PRs that are easy to review. We cannot risk that something slips through the review and breaks our join flows. I think we need to do it in 3 smaller PRs:

  1. Create ReachabilityPeerConnection class and refactor existing code in ClusterReachability to use it - there should be no logic change at all, no behavioral change, the code should do exactly all the same as it currently does, but ClusterReachability should not have any code that deals with PeerConnection APIs or candidates. It should just create 1 instance of ReachabilityPeerConnection and ReachabilityPeerConnection should do all that work.
  2. Add support for per URL checks for UDP- in this PR we'll introduce the new config flag and when it's set, ClusterReachability will pass only TCP and TLS URLs to its single ReachabilityPeerConnection instance and will create additional array of ReachabilityPeerConnection instances - one for each UDP URL.
  3. Add all the extra stuff like getAllClustersInfo() API, populating the details and sending them to the backend etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

validated If the pull request is validated for automation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants