[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working #546

Noksa · 2025-01-16T07:57:18Z

Describe the bug
If we delete a pod which used whereabouts to allocate an additional IP from defined pool of addresses when whereabouts pod wasn't working on the same node, such IP address will be stuck until we delete it manually from several custom resources

Expected behavior
Garbage collector should check at the start of whereabouts if all allocated IPs in manifests still attached to pods (pods still exist)

To Reproduce
Steps to reproduce the behavior:

Create the following net-attach-def, we are going to use only 1 IP address to reproduce this issue asap:

cat <<EOF | kubectl apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  namespace: default
  name: super-net
spec:
  config: |-
    {
      "cniVersion": "0.3.0",
      "name": "super-net",
      "type": "macvlan",
      "master": "eth1",
      "mode": "bridge",
      "ipam": 
          {
            "range": "10.10.3.0/24",
            "range_end": "10.10.3.30",
            "range_start": "10.10.3.30",
            "type": "whereabouts"
          }
    }
EOF

Create a pod to use this net-attach-def:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: super-pod
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: default/super-net
  labels:
    role: super-pod
spec:
  containers:
    - name: super-pod
      image: bash:5.2
      imagePullPolicy: IfNotPresent
      command:
        - bash
        - -cex
        - |-
          trap 'exit 0' SIGINT SIGTERM
          while true; do
            sleep 1
          done
  restartPolicy: Never
  terminationGracePeriodSeconds: 3
EOF

Check events in the pod, we see that the IP address has been added

│   Normal  Scheduled       5s    default-scheduler  Successfully assigned default/super-pod to master1-1
│   Normal  AddedInterface  3s    multus             Add eth0 [10.10.20.249/32] from k8s-pod-network
│   Normal  AddedInterface  3s    multus             Add net1 [10.10.3.30/24] from default/super-net

Scale down whereabouts ds to zero and wait until the pods are gone

kubectl -n kube-system patch daemonset whereabouts -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'

Delete the created pod and wait until it is deleted

kubectl delete pod -n default super-pod

Bring whereabouts pods back

kubectl -n kube-system patch daemonset whereabouts --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'

Recreate the pod again but change its name a bit like in deployment (pods always have different names)

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: super-pod2
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: default/super-net
  labels:
    role: super-pod
spec:
  containers:
    - name: super-pod
      image: bash:5.2
      imagePullPolicy: IfNotPresent
      command:
        - bash
        - -cex
        - |-
          trap 'exit 0' SIGINT SIGTERM
          while true; do
            sleep 1
          done
  restartPolicy: Never
  terminationGracePeriodSeconds: 3
EOF

Check the pod events - the pod is stuck in ContainerCreating state

ERRORED: error configuring pod [default/super-pod2] networking: [default/super-pod2/340a8be4-0b0c-41dd-aadf-54af1bf052e6:super-net]: error adding container to network "super-net": error at storage engine: Could not allocate IP in range: ip: 10.10.3.30 / - 10.10.3.30 / range: 10.10.3.0/24 / excludeRanges: []

Check whereabouts manifests

# overlappingrange
kubectl get overlappingrangeipreservations.whereabouts.cni.cncf.io -n kube-system 10.10.3.30 -o yaml
apiVersion: whereabouts.cni.cncf.io/v1alpha1
kind: OverlappingRangeIPReservation
metadata:
  creationTimestamp: "2025-01-16T07:41:58Z"
  generation: 1
  name: 10.10.3.30
  namespace: kube-system
  resourceVersion: "273309"
  uid: b45c9cf3-c529-4adc-ac7e-0d31d8b35b83
spec:
  ifname: net1
  podref: default/super-pod

# ippools
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system 10.10.3.0-24 -o yaml
apiVersion: whereabouts.cni.cncf.io/v1alpha1
kind: IPPool
metadata:
  creationTimestamp: "2025-01-15T19:16:53Z"
  generation: 14
  name: 10.10.3.0-24
  namespace: kube-system
  resourceVersion: "273308"
  uid: 626c2118-6945-44a9-ace7-96f70b4a3e49
spec:
  allocations:
    "30":
      id: a56b06006c6a3e3a1eb26db82b8cd5db008f20627576e3b1c7926776bffc9ed0
      ifname: net1
      podref: default/super-pod
  range: 10.10.3.0/24

As we can see, whereabouts still thinks that this IP address is allocated to super-pod but this pod is not present anymore. So this IP is stuck forever until we remove it manually like this:

kubectl delete overlappingrangeipreservations.whereabouts.cni.cncf.io -n kube-system 10.10.3.30

And now we should remove the entry for the IP from the ipool:

    "30":
      id: a56b06006c6a3e3a1eb26db82b8cd5db008f20627576e3b1c7926776bffc9ed0
      ifname: net1
      podref: default/super-pod

Once we remove these two parts, the new pod will be able to allocate actually freed IP address.

We also have a bash script to make this procedure more automated:

#!/usr/bin/env bash
set -u

if [[ $# -gt 0 ]] ; then
  export KUBECONFIG=$1
fi

for IP in $(kubectl get overlappingrangeipreservations -n kube-system | cut "-d " -f1) ; do
  if [[ "${IP}" != "NAME" ]] ; then
    POD=$(kubectl get overlappingrangeipreservations -n kube-system "${IP}" -o jsonpath='{.spec.podref}')
    RESULT=$(kubectl get pod -n "$(echo ${POD} | cut -d/ -f1)" "$(echo ${POD} | cut -d/ -f2)" 2>&1)
    if [[ $? -ne 0 ]] ; then
      if echo "${RESULT}" | grep -q 'NotFound' ; then
        echo "Pod ${POD} not found in the cluster. Deleting IP ${IP}"
        kubectl delete overlappingrangeipreservations -n kube-system "${IP}"
        if [[ $? -eq 0 ]] ; then
          echo "OverlappingRangeIPReservation ${IP} deleted"
        fi
        for IPRANGE in $(kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  | cut "-d " -f1) ; do
          if [[ "${IPRANGE}" != "NAME" ]] ; then
            KEY=$(kubectl get ippools.whereabouts.cni.cncf.io "${IPRANGE}" -n kube-system -o json | jq -crM --arg pod "${POD}" '.spec.allocations | map_values(select(.podref==$pod)) | keys[0]')
            kubectl get ippools.whereabouts.cni.cncf.io "${IPRANGE}" -n kube-system -o json | jq -crM --arg key "${KEY}" 'del(.spec.allocations[$key])' | kubectl replace ippools.whereabouts.cni.cncf.io -f -
            if [[ $? -eq 0 ]] ; then
              echo "IPPool ${IPRANGE} replaced"
            fi
          fi
        done
      fi
    fi
  fi
done

The question is that whereabouts doesn't check allocated IP addresses at start and it is possibility to have IP addresses that are stuck

Environment:

Whereabouts version : 0.8.0
Kubernetes version (use kubectl version): doesn't matter, reproduced on 1.30 and 1.31
Network-attachment-definition: see above
Whereabouts configuration (on the host): N/A
OS (e.g. from /etc/os-release): Ubuntu
Kernel (e.g. uname -a): Linux master1-1 6.8.0-1021-aws #23-Ubuntu SMP Mon Dec 9 23:59:34 UTC 2024 x86_64 x86_64 x86_64 GNU/Linu
Others: N/A

Additional info / context
As far as I can see, there is only one predicate in the code for deletion event (when a pod is actually deleted from a cluster):

	podsInformer.AddEventHandler(
		cache.ResourceEventHandlerFuncs{
			DeleteFunc: func(obj interface{}) {
				onPodDelete(queue, obj)
			},
		})

What I can suggest is adding a finalizer to a pod to not let it go if whereabouts didn't remove the finalizer yet or just check garbage on start as well

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working #546

[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working #546

Noksa commented Jan 16, 2025 •

edited

Loading

[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working #546

[BUG] Garbage collector leaves IPs allocated (stuck) if a pod was deleted when whereabouts wasn't working #546

Comments

Noksa commented Jan 16, 2025 • edited Loading

Noksa commented Jan 16, 2025 •

edited

Loading