Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions skills/cozystack-upgrade/skills/cozystack-upgrade/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,15 @@ High-blast-radius stuck states — stuck helm `uninstalling`, Kamaji datastore c
| HR `UninstallFailed, failed to delete release` | Stuck helm history (known-failures #1) |
| TCP `INSTALLED VERSION` diverges from `VERSION` | Kamaji upgrade stuck (known-failures #4) |
| `cozy-system` namespace gone | Missing `helm.sh/resource-policy=keep` (known-failures #7); restore from backup |
| Mass `kubevirt-evacuation-*` VMIMs in `Failed`, `qemu-kvm: error while loading state ... virtio-net` | KubeVirt upgrade crossed the QEMU bump (1.6.x → 1.7+); pre-existing VMs need cold-restart (known-failures #8) |

## KubeVirt 1.6.x → 1.8.x special handling

If Step 1's release-notes analysis shows the target Cozystack version bumps KubeVirt from 1.6.x to 1.7+ (currently 1.8.2 in `release-1.4`), live-migration of every running VM will fail until those VMs are cold-restarted. This is [kubevirt/kubevirt#16386](https://github.com/kubevirt/kubevirt/issues/16386).

**Apply the pre-/post-upgrade workflow in `references/known-failures.md` #8 before and after `helm upgrade`.** It disables `workloadUpdateMethods` first so the operator doesn't trigger a flapping evacuation loop, then drives a paced cold-restart of all running VMs.

Coordinate with VM owners ahead of time: every VM (except explicit opt-outs) gets one ~30-60s downtime during the restart loop. Tenants who can't take that window should be added to the exclusion list; their VMs will keep running on the old QEMU until they restart them themselves.

## Common mistakes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,127 @@ Restore from backup. There is no clean in-cluster recovery for a deleted `cozy-s
3. Re-apply the Platform Package from rescue.yaml (manual review required; CRD schemas may have moved).
4. Expect tenant disruption; communicate to users.

## 8. KubeVirt 1.6.x → 1.8.x: live-migration of pre-existing VMs fails on `virtio-net`

### Symptom

After the Cozystack upgrade rolls out a new KubeVirt version that crosses the QEMU bump boundary (specifically 1.6.x → 1.7+), every live-migration that KubeVirt's `workloadUpdateMethods` triggers fails with:

```text
virError(Code=9, Domain=10, Message='operation failed: job 'migration in' failed:
load of migration failed: Operation not permitted')
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:02.0:00.0/virtio-net'
```

`kubectl get vmim -A` shows a growing pile of `Failed` evacuations on every running VM. KubeVirt keeps retrying — VMs stay up but the migration loop never converges.

### Root cause

[kubevirt/kubevirt#16386](https://github.com/kubevirt/kubevirt/issues/16386). When KubeVirt is upgraded across a QEMU version bump (e.g. `qemu-9.1.0-19.el9` → `qemu-9.1.0-20.el9`), VMs that were running before the upgrade have an in-memory device state tied to the old QEMU. The new QEMU can't reload that state for some devices (notably `virtio-net`) → migration `in` fails with `Operation not permitted`.

This is **not** specific to network/storage configuration. It affects every VM that started under the old QEMU and never restarted. New VMs and VMs restarted after the upgrade are unaffected.

Switching `workloadUpdateMethods` to `[Evict]` does **not** help — the `virt-launcher-eviction-interceptor` webhook converts evictions back into live-migrations because VMIs have `evictionStrategy: LiveMigrate` (an immutable field on a running VMI).

### Recovery / workaround

The only fix is to cold-restart every VM that was running before the upgrade — that re-initialises its in-memory state under the new QEMU. The procedure below disables the operator's auto-migration before the upgrade so it doesn't trigger a flapping loop, then restarts VMs in a controlled, paced sequence.

**Run this before the `helm upgrade` (Step 5 of the main skill) when the target version crosses KubeVirt 1.6.x → 1.8.x.**

```bash
# 1. Snapshot baseline so you can verify what changed
kubectl get vmi -A -o wide > /tmp/vmis-pre-upgrade.txt
kubectl get pods -l kubevirt.io=virt-launcher -A \
-o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\t"}{.spec.containers[?(@.name=="compute")].image}{"\n"}{end}' \
> /tmp/launchers-pre-upgrade.txt
kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml > /tmp/kubevirt-pre.yaml

# 2. Disable workloadUpdateMethods so the new operator doesn't auto-migrate every VM
kubectl -n cozy-kubevirt patch kubevirt kubevirt --type=merge \
-p '{"spec":{"workloadUpdateStrategy":{"workloadUpdateMethods":[]}}}'

# 3. Suspend the kubevirt HelmRelease so Flux doesn't reconcile
# workloadUpdateMethods back from the chart values
kubectl -n cozy-kubevirt patch hr kubevirt --type=merge \
-p '{"spec":{"suspend":true}}'

# 4. Verify both took effect
kubectl -n cozy-kubevirt get kubevirt kubevirt \
-o jsonpath='{.spec.workloadUpdateStrategy.workloadUpdateMethods}{"\n"}'
# expected: []

# 5. NOW run helm upgrade for cozystack (Step 5 of the main skill).
# The control plane (virt-api/controller/handler/operator) will roll over to
# v1.8.x. Existing virt-launcher pods are NOT touched, so VMs keep running
# on the old QEMU. Live-migration BETWEEN two old launchers still works.
```

After the upgrade reaches `Ready=True`, do the phased cold-restart:

```bash
# 6. Build the worklist of VMIs to restart. Excludes any that the operator
# must leave alone (replace EXCLUDED_NS as needed).
EXCLUDED_NS=tenant-edoors # comma-separated if more than one; adjust grep below
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example value tenant-edoors is very specific and might be accidentally used if the user copy-pastes the block. It's better to provide an empty default. Also, the comment mentions "adjust grep below" but the implementation uses awk.

Suggested change
EXCLUDED_NS=tenant-edoors # comma-separated if more than one; adjust grep below
EXCLUDED_NS="" # comma-separated list of namespaces to exclude

kubectl get vmi -A --no-headers \
| awk -v ex="$EXCLUDED_NS" '
BEGIN { n=split(ex,e,","); for (i in e) skip[e[i]]=1 }
$4 == "Running" && !($1 in skip) { print $1"/"$2 }' \
> /tmp/vms-to-restart.txt
wc -l /tmp/vms-to-restart.txt

# 7. Restart each VMI in turn at 30s spacing. delete pod → VMI controller
# creates a new launcher on the now-current image. Per-VM downtime ~30-60s.
while read entry; do
ns="${entry%%/*}"
vmi="${entry##*/}"
pod=$(kubectl -n "$ns" get pods -l kubevirt.io=virt-launcher,vm.kubevirt.io/name="$vmi" \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
Comment on lines +330 to +331
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure the script targets the active workload and avoids issues with pods in Terminating or Failed states (which might exist if a VM is undergoing issues), it's safer to filter for Running pods.

Suggested change
pod=$(kubectl -n "$ns" get pods -l kubevirt.io=virt-launcher,vm.kubevirt.io/name="$vmi" \
-o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
pod=$(kubectl -n "$ns" get pods -l kubevirt.io=virt-launcher,vm.kubevirt.io/name="$vmi" \
--field-selector=status.phase=Running -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)

if [ -n "$pod" ]; then
echo "$(date +%H:%M:%S) restart $ns/$vmi (pod $pod)"
kubectl -n "$ns" delete pod "$pod" --wait=false
fi
sleep 30
done < /tmp/vms-to-restart.txt
```

**Pacing.** 30s spacing × N VMs = total wall time. For 161 VMs that's ~85 min. Tighter spacing risks storage IO surges (DRBD/LINSTOR resyncs). Loosen if storage is hot, tighten if maintenance window is short.

After the loop:

```bash
# 8. Verify everything landed on the new launcher image
kubectl get pods -l kubevirt.io=virt-launcher -A \
-o jsonpath='{range .items[*]}{.spec.containers[?(@.name=="compute")].image}{"\n"}{end}' \
| sort | uniq -c
# expected: only excluded VMs (if any) remain on the old image

# 9. Confirm no VMI is wedged
kubectl get vmi -A --no-headers \
| awk '$4 != "Running" && $4 != "Pending"'
```

### Steady state

If any VMs were intentionally skipped (e.g. tenants who couldn't take downtime in this window), leave `workloadUpdateMethods` empty until those VMs are restarted naturally. Once the cluster is uniformly on the new launcher image:

```bash
kubectl -n cozy-kubevirt patch hr kubevirt --type=merge \
-p '{"spec":{"suspend":false}}'

kubectl -n cozy-kubevirt patch kubevirt kubevirt --type=merge \
-p '{"spec":{"workloadUpdateStrategy":{"workloadUpdateMethods":["LiveMigrate","Evict"]}}}'
```

### Coordination with the user

Before starting, communicate clearly:

- Every VM (except explicit opt-outs) will get **one** ~30-60s downtime during the restart loop.
- The order is alphabetical by namespace; rough ETA is ~30s per VM.
- Tenants with HA workloads on top of single VMIs (e.g. single-replica databases) should be warned individually if their app can't tolerate a brief restart.
- Tenants who need to defer should be added to the exclusion list; their VM will keep running on the old QEMU until they restart it themselves.

## Diagnostic quick reference

| Question | Command |
Expand Down