When Triton servers are deployed on interLink virtual nodes (in Slurm jobs), a single-server setup works fine. However, when number of clients is high enough to trigger autoscaling, the client-server connections break and clients fail.
My main suspicion is that servers started on interLink nodes take too long to load (time to establish wstunnel, pulling singularity image, maybe something else).
Possible ways to fix:
- tweak readiness probes for Triton servers
- would be even better if Triton server itself would appear ready a bit later, eliminating a need to fine-tune Kubernetes probes
- currently it's inconvenient to debug - maybe a better aggregation of logs would help