Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update config to remove need for multiple Ingress objects #236

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jacekn
Copy link
Contributor

@jacekn jacekn commented Jan 30, 2025

This code removes supercluster deployed Ingress and it modified the code to allow the SimplePayment mission to pass with a catch-all nginx proxy.
Proxy config I used is:

server {
  listen 80 default_server;
  server_name _;
  resolver 10.96.0.10 ipv6=off;
  location ~ ^/(.+)-([0-9]+)/core$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:11626/;
  }
  location ~ ^/(.+)-([0-9]+)/core/(.*)$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:11626/$3$is_args$args;
  }

  location ~ ^/(.+)-([0-9]+)/history$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:80/;
  }
  location ~ ^/(.+)-([0-9]+)/history/(.*)$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:80/$3;
  }
}

To allow me to target pods directly I had to modify serviceName in the StatefulSet

This code removes supercluster deployed Ingress and it modified
the code to allow the `SimplePayment` mission to pass with
a catch-all nginx proxy.
Proxy config I used is:
```
server {
  listen 80 default_server;
  server_name _;
  resolver 10.96.0.10 ipv6=off;
  location ~ ^/(.+)-([0-9]+)/core$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:11626/;
  }
  location ~ ^/(.+)-([0-9]+)/core/(.*)$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:11626/$3$is_args$args;
  }

  location ~ ^/(.+)-([0-9]+)/history$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:80/;
  }
  location ~ ^/(.+)-([0-9]+)/history/(.*)$ {
    proxy_pass http://$1-$2.$1.stellar-supercluster.svc.cluster.local:80/$3;
  }
}
```

To allow me to target pods directly I had to modify serviceName in the StatefulSet
@jayz22 jayz22 self-requested a review February 7, 2025 21:32
@jayz22
Copy link
Contributor

jayz22 commented Feb 13, 2025

@jacekn I pushed a commit to unify service name to matching the statefulset name.

However, your proxy routing rule needs to be modified to adapt to different namespaces. i.e.

  location ~ ^/(.+)-([0-9]+)-(.+)/core$ {
    proxy_pass http://$1-$2.$1.$3.svc.cluster.local:11626/;
  }

Because PeerDns name follows the pattern: {$pod-specific-string}.{$serviceName}.{$nameSpace}.svc.cluster.local

I've tested it with SimplePayment and the service name and external name both match what's expected.

I believe this should unblock your testing now. Let me know if you need any more help.

@jacekn
Copy link
Contributor Author

jacekn commented Feb 13, 2025

@jayz22 thank you! I tested things but for me it didn't fully work. First I had to update src/FSLibrary.Tests/Tests.fs to use:

let domain = nonceStr + "-sts-test." + ctx.namespaceProperty + ".svc.cluster.local"

That allowed me to build the code. However during the SimplePayment mission I noticed that statefulset called ssc-1023z-54a4f0-sts-core had serviceName: ssc-1023z-54a4f0-stellar-core as opposed to ssc-1023z-54a4f0-sts-core that I expected.
Is it possible that we need to update serviceName in other places to use new value of the serviceName?

@jayz22
Copy link
Contributor

jayz22 commented Feb 13, 2025

statefulset called ssc-1023z-54a4f0-sts-core had serviceName: ssc-1023z-54a4f0-stellar-core as opposed to ssc-1023z-54a4f0-sts-core that I expected.

That is a headless service (clusterIP: None) so it shouldn't be needed for routing per my understanding (its only purpose is to register the internal sts pods with the DNS). We can change it if it's really necessary (it'll require some arg plumbing because the "core" part is not a fixed string but the name of the coreSet).

Also thought about deployment, I think we need to have the proxy server as part of supercluster mission (in this code base). Because this is a public project and someone else could be running their own supercluster missions in a different cluster and it needs work out of the box.
(we might need elevated permission for the stellar-supercluster and the core members who needs to run this locally to be able to deploy nginx service?)

@jacekn
Copy link
Contributor Author

jacekn commented Feb 14, 2025

I think it's needed, at least that my understanding. The SimplePayment mission, from what I remember will:

  1. Spin up StatefulSets
  2. Call pods over http to schedule network upgrade
  3. Call info endpoint to confirm network was upgraded
  4. Submit TX (assuming over http)
  5. Use http to confirm the TX went through

So when I run SSC on my laptop I believe steps 2-5 need access to the core http port

@jayz22
Copy link
Contributor

jayz22 commented Feb 25, 2025

I think it's needed, at least that my understanding. The SimplePayment mission, from what I remember will:

1. Spin up StatefulSets

2. Call pods over http to schedule network upgrade

3. Call info endpoint to confirm network was upgraded

4. Submit TX (assuming over http)

5. Use http to confirm the TX went through

So when I run SSC on my laptop I believe steps 2-5 need access to the core http port

Sure, the headless service is needed, but only for exposing the statefulset's internal DNS name.

From what I understood, you are creating an proxy which routes external requests to internal services. e.g.

GET http://myproxy.com/node-3/core/status

will be resolved to the internal service

http://node-3.node.stellar-supercluster.svc.cluster.local:11626/status

The headless does not have an external name, so it should be irrelevant for your proxy. If it is, then there is probably something going wrong with your setup.

Anyways I've pushed another change to unify the headless service name to be same as the StatefulSet. You might need to iron out some small details but as far as naming consistency goes, this should work. Here is what I observe

regular service (type: ExternalName, one-per-pod)

image

headless service (type: ClusterIP, one-per-sts)

image

Let me know if this works.

@jacekn
Copy link
Contributor Author

jacekn commented Mar 6, 2025

@jayz22 thank you very much, the code works great. I had to make one small adjustment to the test but other than that it worked:

diff --git a/src/FSLibrary.Tests/Tests.fs b/src/FSLibrary.Tests/Tests.fs
index eae2c06..939bee5 100644
--- a/src/FSLibrary.Tests/Tests.fs
+++ b/src/FSLibrary.Tests/Tests.fs
@@ -137,7 +137,7 @@ type Tests(output: ITestOutputHelper) =
         let peer1DNS = (nCfg.PeerDnsName coreSet 1).StringName
         let peer2DNS = (nCfg.PeerDnsName coreSet 2).StringName
         let nonceStr = nCfg.networkNonce.ToString()
-        let domain = nonceStr + "-sts-core." + ctx.namespaceProperty + ".svc.cluster.local"
+        let domain = nonceStr + "-sts-test." + ctx.namespaceProperty + ".svc.cluster.local"
         Assert.Equal(nonceStr + "-sts-test-0." + domain, peer0DNS)
         Assert.Equal(nonceStr + "-sts-test-1." + domain, peer1DNS)
         Assert.Equal(nonceStr + "-sts-test-2." + domain, peer2DNS)

I also tested the code with ingress creation uncommented. It worked as well.

I used the following proxy config:

  default.conf: |
    server {
      listen 80 default_server;
      server_name _;
      resolver 10.96.0.10 ipv6=off;

      if ($host ~* "^[^.]+\.([^.]+)\..*$" ) {
        set $namespace $1;
      }

      location ~ ^/(.+)-([0-9]+)/core$ {
        proxy_pass http://$1-$2.$1.$namespace.svc.cluster.local:11626/;
      }
      location ~ ^/(.+)-([0-9]+)/core/(.*)$ {
        proxy_pass http://$1-$2.$1.$namespace.svc.cluster.local:11626/$3$is_args$args;
      }

      location ~ ^/(.+)-([0-9]+)/history$ {
        proxy_pass http://$1-$2.$1.$namespace.svc.cluster.local:80/;
      }
      location ~ ^/(.+)-([0-9]+)/history/(.*)$ {
        proxy_pass http://$1-$2.$1.$namespace.svc.cluster.local:80/$3;
      }
    }

@jayz22
Copy link
Contributor

jayz22 commented Mar 6, 2025

@jayz22 thank you very much, the code works great. I had to make one small adjustment to the test but other than that it worked:

I also tested the code with ingress creation uncommented. It worked as well.

Glad to hear! And 👍 on the ingress config change that captures the namespace properly.

I'm very curious on the performance impact on the coreDNS with this change. From what I understand, your approach removes the need to keep track of ingress objects which will help the coreDNS performance during mission startup. Whereas the actual pods count remains the same which means the (steady state) hit to coreDNS for dns resolution will likely stay the same. Curious of the overall impact, I hope this at least significantly reduces the peak/startup load you observed previously.

@jacekn
Copy link
Contributor Author

jacekn commented Mar 12, 2025

I run comparison between the current and the new design. For the new design I injected the proxy service manually so it wasn't 100% automated test but I think it's good enough to compare coreDNS impact. All parameters were the same and I run only "small tests" mission which is the one I saw having the largest impact on core DNS.

Current code resulted in peak of around 24k rps and almost saturated core DNS pod CPU:
dns_old1

New code resulted in just over 60 rps and ~1% core DNS cpu usage:
dns_ new1

@jayz22
Copy link
Contributor

jayz22 commented Mar 12, 2025

I run comparison between the current and the new design. For the new design I injected the proxy service manually so it wasn't 100% automated test but I think it's good enough to compare coreDNS impact.

Wow, this looks really promising! I'm also wondering if you saw any changes in the steady-state rps (i.e. showing more range after the initial peak) or are they too small to matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants