-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support controller-wide HA Daemon configuration #331
Comments
A similar use-case would be wanting to avoid vertices with all it's pods on the same node and having it scaled down by cluster-autoscaler causing a temporary disruption in the data stream. Having to configure affinity for each vertex would be tedious and can't be done outside of the vertex as you would need both the pipeline and vertex names in the label match:
|
I kind of think it's not good to have things like PDB incorporated into Pipeline spec, since ppl can configure this by themselves - of course we should provide documents on how to configure it (mainly for the labels). Reason for that are:
The affinity example is actually the reason I was thinking of if we should expose the controller level customization the other day - affinity spec itself is already very flexible, having all of these customization makes things even more flexible and complicated, I'm not quite sure it's good or not. Thoughts? |
Yep, that makes sense to me. What i'm struggling with is that on one cluster i want the ability to read/edit/diff/run any pipelines easily and then i want to promote them to another cluster ensuring they have certain PDB, affinity, annotations, replicas, resources, etc. exist on vertices, daemon, jobs. FWIW this isn't really a numaflow-specific need, as it applies to web apps as well. I think the main difference is that a Deployment resource has a single pod template, whereas a single Pipeline resource can have a lot. Maybe in the future a Pipeline can reference other resources to populate some config, which might help break things up into multiple files and promote re-use making it more manageable to work with Pipelines that have a lot of yaml. Anyway, for now maybe it's best to close #314, can always re-open or borrow from it in the future if it's useful. I think for now i'll write a migration script for my pipeline resources (in my personal github) that adds the bits i'm looking for. That gets me what I need for now since numaflow |
not sure what issues are best to have open. sounds like close this one and milestone sounds like #305 is TBD, so don't know if that one makes sense to keep open or close? |
Summary
#305 simplifies Pipeline yaml for daemon deployments for resources, replicas, annotations, etc. However it's not particularly useful for anything requiring the pipeline-name (i.e. affinity's
labelSelector
or podDisruptionBudget'smatchLabels
).PodDisruptionBudget
being a bit different from affinity, since it's a separate resource that currently the controller doesn't manage.Would be helpful to support configuring these things on the controller, maybe one of the following?:
templates
go-templates that get parsed, and then in reconcile getexecute
'd with access to the Pipeline object, allowing user access to the pipeline resource (i.e.numaflow.numaproj.io/pipeline-name: {{ .Pipeline.Name}}
)PodDisruptionBudgets would need to be owned by the Pipeline resource and managed by the controller.
Use Cases
Avoid needing to add something like the following to each pipeline to ensure the daemon is always available (replace
example-pipeline
with the pipeline name):This isn't super important for me atm, but would be nice to support for a simpler production install. Otherwise in clusters that scale up/down a lot, k8s cluster autoscaler can terminate nodes containing all the daemon pods for a particular Pipeline at the same time & cause a disruption for anyone querying daemon.
Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: