Skip to content

Introduce Leader Election and use Deployments for agents #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

davidepasquero
Copy link

The agents have been changed from pod to deployment so that their life cycle is managed by Kubernetes

Copy link

mergify bot commented Jul 4, 2025

This pull request is now in conflict. Could you fix it @davidepasquero? 🙏

davidepasquero1 and others added 27 commits July 4, 2025 21:14
Signed-off-by: davidepasquero <[email protected]>
Signed-off-by: davidepasquero <[email protected]>
Signed-off-by: davidepasquero <[email protected]>
This change introduces leader election to the VM DHCP agent, mirroring the
mechanism used in the controller. This ensures that only one instance of the
agent is active at a time, preventing potential conflicts and ensuring stable
operation in a clustered environment.

The implementation utilizes the `github.com/rancher/wrangler/pkg/leader`
package. A `noLeaderElection` flag has been added to the agent's command-line
options to allow bypassing leader election for development or testing purposes.

Signed-off-by: davidepasquero <[email protected]>
Refactors the Helm chart to deploy the VM DHCP agent as a separate
Kubernetes Deployment, rather than being managed directly by the controller.
This change aligns with standard Kubernetes practices and improves
resilience and manageability.

Key changes include:
- Added a new Deployment manifest for the agent (`agent-deployment.yaml`).
- Introduced dedicated RBAC resources (ServiceAccount, ClusterRole,
  ClusterRoleBinding) for the agent to enable leader election and
  dynamic IPPool discovery.
- Updated `values.yaml` to provide configuration options for the new
  agent deployment and its associated resources.
- Modified the controller's Deployment manifest to remove the direct
  management and launching of the agent.

This chart refactoring is a prerequisite for upcoming Go code changes
in the agent to implement IPPool watching and in the controller to
remove agent process management. The agent will now rely on its own
leader election mechanism, implemented in a previous change.

Signed-off-by: davidepasquero <[email protected]>
This commit removes the controller's responsibility for creating,
managing, and monitoring dedicated agent pods for IPPools. This
functionality is being shifted to a separate agent deployment.

Changes include:
- Removed DeployAgent and MonitorAgent logic from IPPool controller.
- Deleted status.agentPodRef and AgentReady condition from IPPool types.
- Removed CLI flags and configuration options related to agent images,
  namespaces, service accounts, and the no-dhcp flag for agents.
- Cleaned up associated helper functions, structs, and unused code.

Signed-off-by: davidepasquero <[email protected]>
Signed-off-by: davidepasquero <[email protected]>
Signed-off-by: davidepasquero <[email protected]>
Signed-off-by: davidepasquero <[email protected]>
The agent was previously hardcoded to use the 'kube-system' namespace
for its leader election lock. This caused permission errors as the
agent's ServiceAccount was not authorized for that namespace.

This commit changes the agent to use its own namespace (obtained via
the Downward API through the POD_NAMESPACE environment variable)
for leader election. The Helm chart has been updated to provide this
environment variable to the agent deployment.

Signed-off-by: davidepasquero <[email protected]>
This commit updates the labels in `agent-clusterrole.yaml` to use
the common labels helper and override the component, making it
consistent with the approach used in `agent-clusterrolebinding.yaml`.

This is a speculative change unlikely to fix a persistent 'forbidden'
error if RBAC rules were otherwise correctly defined and applied, but
it improves chart consistency.

Signed-off-by: davidepasquero <[email protected]>
The webhook was failing TLS verification because its certificate was not
valid for the FQDN the API server was using (e.g.,
`webhook-name.namespace.svc`). This was due to the underlying
`harvester/webhook` library not receiving the correct namespace during
certificate generation, leading to a malformed SAN (e.g.,
`webhook-name..svc`).

This commit adds the `--namespace {{ .Release.Namespace }}` argument to
the webhook container startup, ensuring the certificate is generated
with the correct SANs.

Note: A potential issue was observed where the webhook deployment
specifies a ServiceAccount name
(`{{fullname}}-webhook-webhook`) that might not be explicitly created
by the chart. This may need further investigation if webhook pods
encounter permission issues or fail to start due to SA problems, though
it's unrelated to the certificate SAN issue.

Signed-off-by: davidepasquero <[email protected]>
This commit changes the metadata.name for the agent's ClusterRole
and ClusterRoleBinding to use a simpler naming convention based directly
on `{{ .Release.Name }}`. This is an attempt to eliminate potential
issues from complex name templating for these RBAC resources, in response
to persistent 'configmaps forbidden' errors for the agent.

The ServiceAccount name itself remains unchanged as it correctly matches
the user identified in the error logs.

Signed-off-by: davidepasquero <[email protected]>
This commit refactors the system to allow the main vm-dhcp-controller
to dynamically update the vm-dhcp-agent's Deployment based on IPPool
resources. When an IPPool is detected, the controller will now:

1. Parse `IPPool.spec.networkName` to determine the target
   NetworkAttachmentDefinition (NAD).
2. Update the agent Deployment's pod template to include the
   `k8s.v1.cni.cncf.io/networks` annotation pointing to this NAD.
3. Update the `--nic` argument of the agent container to use the
   interface name associated with the Multus attachment (e.g., `net1`).

Changes include:
- Added RBAC permissions for the controller to get, list, watch, and patch
  Deployments in its namespace.
- Modified `pkg/controller/ippool/controller.go` to implement the
  logic for fetching and updating the agent Deployment.
- Updated `pkg/config/context.go` and `cmd/controller/run.go` to provide
  necessary KubeClient and namespace information to the controller logic.
- Reverted previous static Multus configuration from `values.yaml` and
  the agent's Helm template, as this is now handled dynamically.

The controller relies on an `AGENT_DEPLOYMENT_NAME` environment variable
(to be set in its own deployment manifest) to accurately locate the agent
deployment for updates.

Signed-off-by: davidepasquero <[email protected]>
This commit addresses several Go build errors in `pkg/controller/ippool/controller.go`:
- Removed unused imports for `k8s.io/api/apps/v1` (appsv1) and `encoding/json`.
- Added the missing import for the `os` package.
- Modified the `getAgentDeploymentName` function to not accept arguments,
  as its implementation uses `os.Getenv` and did not use the parameter.
- Updated the call to `getAgentDeploymentName` accordingly.

Signed-off-by: davidepasquero <[email protected]>
Removes a duplicate import of `k8s.io/apimachinery/pkg/apis/meta/v1`
aliased as `metav1` in `pkg/controller/ippool/controller.go` to fix
the 'metav1 redeclared in this block' build error.

Signed-off-by: davidepasquero <[email protected]>
This commit extends the dynamic agent deployment modification logic.
When the IPPool controller reconciles an IPPool and updates the agent
deployment for NAD attachment (Multus annotation and --nic arg), it
now also updates the `IPPOOL_REF` environment variable in the agent's
container.

This ensures that the agent is not only connected to the correct network
but is also configured to manage and serve IPs for the intended IPPool.

Changes:
- Modified `syncAgentDeployment` in `pkg/controller/ippool/controller.go`
  to find/update or add the `IPPOOL_REF` env var.
- Added `corev1` import for `corev1.EnvVar`.
- Ensured `IPPOOL_REF` is defined in the agent deployment template
  (`chart/templates/agent-deployment.yaml`) so the controller can update it.

Signed-off-by: davidepasquero <[email protected]>
This commit removes the `envVarUpdated` variable from the
`syncAgentDeployment` function in `pkg/controller/ippool/controller.go`
as it was declared but its value was not subsequently used.
The `needsUpdate` flag already correctly handles the logic for
determining if the agent deployment requires an update.

Signed-off-by: davidepasquero <[email protected]>
This commit addresses two issues preventing the controller from dynamically
updating the agent deployment:

1. Agent Container Name: The controller now uses an `AGENT_CONTAINER_NAME`
   environment variable (set to `{{ .Chart.Name }}-agent` in its own
   deployment manifest) to reliably find the agent container within the
   agent's deployment. This replaces a hardcoded constant that could be
   mismatched.

2. Deployment Update RBAC: The controller's Role for managing deployments
   was granted the `update` verb in addition to `patch`. This resolves
   the RBAC error encountered when the client-go library attempts to update
   the agent deployment resource.

Signed-off-by: davidepasquero <[email protected]>
… log

This commit corrects a log message in `pkg/controller/ippool/controller.go`
that was still referencing the removed constant `AgentContainerNameDefault`.
The log message now correctly uses the `h.getAgentContainerName()` method
to get the agent container name, resolving the build error.

Signed-off-by: davidepasquero <[email protected]>
This commit implements the functionality for the DHCP agent to self-configure
a static IP address (the `serverIP` from the IPPool) on its Multus-attached
network interface. This allows the DHCP server running in the agent to use
the correct source IP for its replies.

Changes include:
- Fixed `--nic` argument update logic in the IPPool controller to prevent
  malformed arguments in the agent deployment.
- Added `--server-ip` and `--cidr` command-line flags to the agent.
- Updated `config.AgentOptions` to include these new fields.
- Modified the IPPool controller to extract `serverIP` and `cidr` from
  the `IPPool` spec and pass them as arguments to the agent deployment.
- Granted `NET_ADMIN` capability to the agent container to allow
  network interface modification.
- Implemented `configureInterface` method in `pkg/agent/agent.go` which
  uses `ip address add` and `ip link set up` commands to assign the
  static IP to the agent's target NIC before the DHCP server starts.

Signed-off-by: davidepasquero <[email protected]>
Removes a duplicate and misplaced import block that caused a
`syntax error: imports must appear before other declarations` build error.
The necessary packages were already correctly imported in the main
import block at the top of the file.

Signed-off-by: davidepasquero <[email protected]>
This commit addresses an issue where a newly started/leader DHCP agent pod
might report "NO LEASE FOUND" for existing leases. This occurred because
the DHCP server could start before its internal lease cache was fully
synchronized with the IPPool custom resource's status.

Changes:
1. Introduced an `InitialSyncDone` channel in the agent's IPPool event
   handler (`pkg/agent/ippool/event.go` and its local `controller.go`).
2. The local controller now populates the `DHCPAllocator`'s leases from
   the `IPPool.Status.IPv4.Allocated` map and then signals on the
   `InitialSyncDone` channel.
3. The main agent logic (`pkg/agent/agent.go`) now waits for this signal
   before starting the DHCP server, ensuring the cache is primed.
4. The cache population logic in the local controller includes handling for
   adding new leases and removing stale ones based on the IPPool status.

Signed-off-by: davidepasquero <[email protected]>
…ller

This commit addresses several build errors in `pkg/agent/ippool/controller.go`
and corrects logic within its `Update` method:

- Added missing imports for `sync` and `github.com/harvester/vm-dhcp-controller/pkg/util`.
- Removed an unused variable `currentLeasesInAllocator`.
- Corrected the logic for checking lease existence using `c.dhcpAllocator.GetLease()`.
- Ensured correct handling of `LeaseTime` type conversion for `dhcpAllocator.AddLease()`.
- Resolved method redeclaration error by deleting the duplicate file
  `pkg/agent/ippool/ippool.go`.

These changes aim to fix the build and improve the correctness of the DHCP
lease cache synchronization logic within the agent.

Signed-off-by: davidepasquero <[email protected]>
This commit addresses multiple build errors in the agent's local ippool
controller (`pkg/agent/ippool/controller.go`) and refines its lease
synchronization logic:

- Corrected `LeaseTime` type handling when calling `dhcpAllocator.AddLease`,
  passing `specConf.LeaseTime` directly as it's assumed to be `*int`
  based on compiler errors.
- Added missing `sync` and `util` package imports.
- Removed unused variables.
- Corrected the logic for checking lease existence before deletion.
- Deleted the conflicting `pkg/agent/ippool/ippool.go` file that caused
  a method redeclaration error.

These changes are intended to fix the build and ensure the agent's DHCP
lease cache is correctly populated from the IPPool status, particularly
after failovers or restarts.

Signed-off-by: davidepasquero <[email protected]>
@davidepasquero davidepasquero force-pushed the fix/agent-leader-election-namespace branch from 1981a89 to 51ff3a8 Compare July 4, 2025 21:16
@davidepasquero
Copy link
Author

Context

This pull request implements the approach discussed in the previous PR #54 (comment) and aims to provide a robust, Kubernetes-native solution to the agent lifecycle management problem.

Problem Solved

The previous agent management via single Pods had a critical flaw: in the event of a node crash, the agent pod would get stuck in a "Terminating" state on that node. This prevented it from being rescheduled onto a healthy node without manual intervention, causing a service disruption.

Implemented Solution

Following the feedback received, this PR introduces two fundamental changes:

  1. Transition from Pod to Deployment:

    • The agents are no longer managed as individual Pods but through a Kubernetes Deployment.
    • This delegates the responsibility of managing the pod lifecycle to the Kubernetes Deployment controller. If a pod (or its node) fails, the kubernetes deploy controller will automatically create a new replica on an available node, ensuring high availability.
  2. Introduction of Leader Election:

    • Since a Deployment can manage multiple replicas, a leader election mechanism has been introduced to ensure that only one agent instance (the "leader") is active at any given time.
    • The other instances remain on standby, ready to take over if the current leader fails.
    • This prevents conflicts and ensures that operations performed by the agent are not duplicated.

Benefits of this Approach

  • Reliability: Definitively solves the problem of stuck pods.
  • Kubernetes-Native Standard: Aligns the agent's behavior with the best practices and patterns commonly adopted in the Kubernetes and Harvester ecosystem.
  • Simplified Management: It leverages native Kubernetes mechanisms, reducing the need for custom logic to handle failures.

@davidepasquero
Copy link
Author

Key Technical Changes

1. Fixed Agent Leader Election Namespace

  • Problem: Initially, the agent attempted to create its leader election lock (a ConfigMap or Lease) in the kube-system namespace, causing permission errors.
  • Solution:
    • Modified the agent's code (cmd/agent/run.go) to use its own namespace for leader election. This namespace is obtained via the POD_NAMESPACE environment variable, injected through the Downward API.
    • Updated the agent's deployment manifest (chart/templates/agent-deployment.yaml) to provide this environment variable to the pod.

2. Fixed Webhook Certificate SAN Mismatch

  • Problem: Resolved a TLS error where the webhook certificate was not valid for the expected FQDN (...webhook.harvester-system.svc vs. ...webhook..svc). The issue was that the harvester/webhook library was not receiving the correct namespace during certificate generation.
  • Solution: Added the --namespace {{ .Release.Namespace }} argument to the webhook container in its deployment manifest (chart/templates/deployment.yaml) to provide the correct namespace.

3. Dynamic Agent Network Interface Management (Multus NAD Attachment)

  • Feature: Implemented functionality for the main vm-dhcp-controller to dynamically update the vm-dhcp-agent Deployment when an IPPool is detected.
  • Implementation:
    • Controller RBAC: Extended the main controller's RBAC permissions to allow it to get, list, watch, patch, and update Deployment resources in its namespace.
    • IPPool Controller Logic: Modified pkg/controller/ippool/controller.go to:
      • Fetch the agent's Deployment.
      • Add/update the k8s.v1.cni.cncf.io/networks annotation to specify the NAD (derived from IPPool.spec.networkName) to which the agent must connect.
      • Update the --nic container argument with the Multus interface name (e.g., net1).
      • Update the IPPOOL_REF environment variable in the agent container with the namespace/name of the current IPPool.
    • Agent Static IP Configuration on Multus Interface:
      • Added --server-ip and --cidr CLI flags to the agent, which are set by the main controller based on the IPPool's configuration.
      • Added the NET_ADMIN capability to the agent container to allow it to configure network interfaces.
      • Implemented logic in pkg/agent/agent.go (configureInterface method) so that the agent, upon becoming leader, configures the specified network interface with the provided static IP and CIDR.

4. Agent DHCP Lease Cache Synchronization on Startup/Failover

  • Problem: Fixed an issue where a newly elected leader agent might not find existing leases because its DHCP server started before the internal lease cache was synchronized with the IPPool's state.
  • Solution:
    • Introduced a synchronization mechanism (InitialSyncDone channel and sync.Once) to control the startup sequence.
    • The agent's local controller now first populates the DHCPAllocator cache by reading IPPool.Status.IPv4.Allocated and then signals that the initial sync is complete.
    • The agent (pkg/agent/agent.go) now waits for this signal before starting the actual DHCP server.
    • Added an AddFunc to the agent's informer to ensure the IPPool's initial state is processed immediately on startup to populate the cache.

5. Miscellaneous Build Fixes

  • Resolved several Go build errors related to missing/duplicate imports, unused variables, and incorrect logic (e.g., LeaseTime handling, references to removed constants).

@davidepasquero davidepasquero changed the title Fix/agent leader election namespace Introduce Leader Election and use Deployments for agents Jul 4, 2025
@davidepasquero davidepasquero force-pushed the fix/agent-leader-election-namespace branch from 81975d5 to 6cb2aca Compare July 5, 2025 07:09
Copy link
Member

@starbops starbops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @davidepasquero. Could you help organize your commits and provide a concise summary of what this PR is trying to achieve and how it works? Thank you.

@davidepasquero
Copy link
Author

davidepasquero commented Jul 9, 2025 via email

@davidepasquero
Copy link
Author

Hi @davidepasquero. Could you help organize your commits and provide a concise summary of what this PR is trying to achieve and how it works? Thank you.

#64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants