-
Notifications
You must be signed in to change notification settings - Fork 95
Add known issue for IP address change upgrading to v1.7 when using DHCP #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add known issue for IP address change upgrading to v1.7 when using DHCP #922
Conversation
4dded6c to
39cb6c7
Compare
|
|
The actual doc looks good. Just wondering if we should advise users to check this before upgrade and write the 91-NetworkManager file before triggering the upgrade? If a user runs into issue during upgrade and node is only accessible via a serial console, generating this file will be cumbersome. Or they can always just use a different ip address temporarily to allow ssh to the nodes. |
|
I'm not sure we can write the 91_networkmanager file before upgrade -- it's generated by the harvester-installer binary from v1.7, which is available inside the upgrade container, but not really anywhere else. That file gets generated when upgrade_node.sh runs, so it happens during upgrade, but before the node is rebooted and thus before the problem potentially occurs. So it will be there, and just needs that little tweak, but you're right, getting to the node when it's IP address has changed could be irritating... |
Related-to: harvester/harvester#9260 Signed-off-by: Tim Serong <[email protected]>
Co-authored-by: Daria Vladykina <[email protected]> Signed-off-by: Tim Serong <[email protected]>
da9980f to
dacf172
Compare
jillian-maroket
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review done
|
|
||
| ### 1. Host IP address may change during upgrade when using DHCP | ||
|
|
||
| Harvester v1.7.x uses NetworkManager instead of wicked, which was used in earlier versions of Harvester. These two network stacks have different defaults for generating DHCP client IDs. This means that if you are using DHCP to configure your host IP addresses, after the operating system on each host is upgraded and the host rebooted, your DHCP server may return a different IP address for that host than it did before. If this happens, the host in question will be unable to join the cluster on startup because its IP address has changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Harvester v1.7.x uses NetworkManager instead of wicked, which was used in earlier versions of Harvester. These two network stacks have different defaults for generating DHCP client IDs. This means that if you are using DHCP to configure your host IP addresses, after the operating system on each host is upgraded and the host rebooted, your DHCP server may return a different IP address for that host than it did before. If this happens, the host in question will be unable to join the cluster on startup because its IP address has changed. | |
| Harvester v1.7.x uses NetworkManager instead of wicked, which was used in earlier versions of Harvester. These two network stacks have different defaults for generating DHCP client IDs. | |
| If the host IP addresses are configured using DHCP, a Harvester upgrade and subsequent reboot may cause the DHCP server to assign IP addresses that are different from what hosts previously used. Consequently, the affected hosts are unable to join the cluster on startup because of the IP address change. |
|
|
||
| Harvester v1.7.x uses NetworkManager instead of wicked, which was used in earlier versions of Harvester. These two network stacks have different defaults for generating DHCP client IDs. This means that if you are using DHCP to configure your host IP addresses, after the operating system on each host is upgraded and the host rebooted, your DHCP server may return a different IP address for that host than it did before. If this happens, the host in question will be unable to join the cluster on startup because its IP address has changed. | ||
|
|
||
| This problem will not occur if your DHCP server is configured to allocate fixed IP addresses based on MAC address, as is done in [Harvester iPXE Examples](https://github.com/harvester/ipxe-examples). However, it will occur if the DHCP server is allocating IP addresses based solely on DHCP client ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This problem will not occur if your DHCP server is configured to allocate fixed IP addresses based on MAC address, as is done in [Harvester iPXE Examples](https://github.com/harvester/ipxe-examples). However, it will occur if the DHCP server is allocating IP addresses based solely on DHCP client ID. | |
| This issue typically occurs when the DHCP server allocates IP addresses based solely on the DHCP client ID. You are unlikely to encounter this issue when the DHCP server is configured to allocate fixed IP addresses based on the MAC address (as demonstrated in the [Harvester iPXE Examples](https://github.com/harvester/ipxe-examples)). |
|
|
||
| This problem will not occur if your DHCP server is configured to allocate fixed IP addresses based on MAC address, as is done in [Harvester iPXE Examples](https://github.com/harvester/ipxe-examples). However, it will occur if the DHCP server is allocating IP addresses based solely on DHCP client ID. | ||
|
|
||
| For single-node Harvester deployments that have this issue, Harvester simply will not start after rebooting after the upgrade, because the IP address is changed. For multi-node deployments, you may find management nodes are stuck "Waiting Reboot". In both cases, to address this issue, perform the following steps _after_ each node is upgraded and its IP address has changed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For single-node Harvester deployments that have this issue, Harvester simply will not start after rebooting after the upgrade, because the IP address is changed. For multi-node deployments, you may find management nodes are stuck "Waiting Reboot". In both cases, to address this issue, perform the following steps _after_ each node is upgraded and its IP address has changed: | |
| The impact of this issue varies by cluster size: | |
| - Single-node clusters: Harvester fails to start after rebooting because the IP address has changed. | |
| - Multi-node clusters: Management nodes become stuck in the "Waiting Reboot" state. | |
| To address the issue, perform the following steps: | |
| :::info important | |
| You must perform the steps for each affected node _after_ the upgrade is completed and the IP address has changed. | |
| ::: |
| 1. Log in to the affected node, either via `ssh` to its new IP address, or by using the console. | ||
| 1. Check for lease XML file in the `/var/lib/wicked` directory. It should be named similar to `/var/lib/wicked/lease-mgmt-br-dhcp-ipv4.xml`. If you are using a VLAN, the file name will include the VLAN ID, for example, `/var/lib/wicked/lease-mgmt-br.2017-dhcp-ipv4.xml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. Log in to the affected node, either via `ssh` to its new IP address, or by using the console. | |
| 1. Check for lease XML file in the `/var/lib/wicked` directory. It should be named similar to `/var/lib/wicked/lease-mgmt-br-dhcp-ipv4.xml`. If you are using a VLAN, the file name will include the VLAN ID, for example, `/var/lib/wicked/lease-mgmt-br.2017-dhcp-ipv4.xml`. | |
| 1. Log in to the affected node. You can either access the node via SSH at its new IP address or use the console. | |
| 1. In the `/var/lib/wicked` directory, check for the lease XML file (named similar to `/var/lib/wicked/lease-mgmt-br-dhcp-ipv4.xml`). | |
| If you are using a VLAN, the file name includes the VLAN ID (for example, `/var/lib/wicked/lease-mgmt-br.2017-dhcp-ipv4.xml`). | |
| 1. View this file to find the DHCP client ID: | ||
| ``` | ||
| $ cat /var/lib/wicked/lease-mgmt-br-dhcp-ipv4.xml | ||
| <lease> | ||
| ... | ||
| <ipv4:dhcp> | ||
| <client-id>ff:00:dd:c7:05:00:01:00:01:30:ae:a0:d3:52:54:00:dd:c7:05</client-id> | ||
| ... | ||
| </ipv4:dhcp> | ||
| </lease> | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. View this file to find the DHCP client ID: | |
| ``` | |
| $ cat /var/lib/wicked/lease-mgmt-br-dhcp-ipv4.xml | |
| <lease> | |
| ... | |
| <ipv4:dhcp> | |
| <client-id>ff:00:dd:c7:05:00:01:00:01:30:ae:a0:d3:52:54:00:dd:c7:05</client-id> | |
| ... | |
| </ipv4:dhcp> | |
| </lease> | |
| ``` | |
| 1. View the file and identify the DHCP client ID. | |
| ``` | |
| $ cat /var/lib/wicked/lease-mgmt-br-dhcp-ipv4.xml | |
| <lease> | |
| ... | |
| <ipv4:dhcp> | |
| <client-id>ff:00:dd:c7:05:00:01:00:01:30:ae:a0:d3:52:54:00:dd:c7:05</client-id> | |
| ... | |
| </ipv4:dhcp> | |
| </lease> | |
| ``` | |
| 1. Edit the `/oem/91_networkmanager.yaml` file and add the DHCP client ID to the content of the appropriate NetworkManager connection profile inside that file. If you are not using a VLAN, use the `bridge-mgmt.nmconnection` section. If you are using a VLAN, use `vlan-mgmt.nmconnection`. In either case, add `dhcp-client-id=CLIENT_ID_FROM_WICKED_LEASE_FILE` below the `[ipv4]` line, for example: | ||
| ``` | ||
| $ cat /oem/91_networkmanager.yaml | ||
| name: Harvester Network Configuration | ||
| stages: | ||
| initramfs: | ||
| - files: | ||
| ... | ||
| - path: /etc/NetworkManager/system-connections/bridge-mgmt.nmconnection | ||
| ... | ||
| content: | | ||
| ... | ||
| [ipv4] | ||
| dhcp-client-id=ff:00:dd:c7:05:00:01:00:01:30:ae:a0:d3:52:54:00:dd:c7:05 | ||
| method=auto | ||
| ... | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. Edit the `/oem/91_networkmanager.yaml` file and add the DHCP client ID to the content of the appropriate NetworkManager connection profile inside that file. If you are not using a VLAN, use the `bridge-mgmt.nmconnection` section. If you are using a VLAN, use `vlan-mgmt.nmconnection`. In either case, add `dhcp-client-id=CLIENT_ID_FROM_WICKED_LEASE_FILE` below the `[ipv4]` line, for example: | |
| ``` | |
| $ cat /oem/91_networkmanager.yaml | |
| name: Harvester Network Configuration | |
| stages: | |
| initramfs: | |
| - files: | |
| ... | |
| - path: /etc/NetworkManager/system-connections/bridge-mgmt.nmconnection | |
| ... | |
| content: | | |
| ... | |
| [ipv4] | |
| dhcp-client-id=ff:00:dd:c7:05:00:01:00:01:30:ae:a0:d3:52:54:00:dd:c7:05 | |
| method=auto | |
| ... | |
| ``` | |
| 1. Edit the `/oem/91_networkmanager.yaml` file and add the DHCP client ID to the appropriate NetworkManager connection profile within that file. | |
| The section you need to modify depends on whether your node uses a VLAN. | |
| - No VLAN: Add the DHCP client ID to the `bridge-mgmt.nmconnection` section. | |
| - VLAN used: Add the DHCP client ID to the `vlan-mgmt.nmconnection` section. | |
| In either case, you must add `dhcp-client-id=CLIENT_ID_FROM_WICKED_LEASE_FILE` below the `[ipv4]` line. Replace `CLIENT_ID_FROM_WICKED_LEASE_FILE` with the actual client ID. | |
| Example: | |
| ``` | |
| $ cat /oem/91_networkmanager.yaml | |
| name: Harvester Network Configuration | |
| stages: | |
| initramfs: | |
| - files: | |
| ... | |
| - path: /etc/NetworkManager/system-connections/bridge-mgmt.nmconnection | |
| ... | |
| content: | | |
| ... | |
| [ipv4] | |
| dhcp-client-id=ff:00:dd:c7:05:00:01:00:01:30:ae:a0:d3:52:54:00:dd:c7:05 | |
| method=auto | |
| ... | |
| ``` |
| 1. Reboot the node. The DHCP server should now return the original IP address and the affected node should be able to join the cluster. | ||
| 1. Repeat as necessary for each remaining node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. Reboot the node. The DHCP server should now return the original IP address and the affected node should be able to join the cluster. | |
| 1. Repeat as necessary for each remaining node. | |
| 1. Reboot the node. | |
| The DHCP server should return the original IP address and the affected node should be able to join the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the last step because I added the following to the procedure intro:
:::info important
You must perform the steps for each affected node _after_ the upgrade is completed and the IP address has changed.
:::
| :::info important | ||
| If you are using DHCP to configure your host IP addresses, the IP addresses may change during upgrade, which will prevent the cluster from starting correctly. This requires manual intervention to remedy. For full details, see [Host IP address may change during upgrade when using DHCP](#1-host-ip-address-may-change-during-upgrade-when-using-dhcp). | ||
|
|
||
| ::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| :::info important | |
| If you are using DHCP to configure your host IP addresses, the IP addresses may change during upgrade, which will prevent the cluster from starting correctly. This requires manual intervention to remedy. For full details, see [Host IP address may change during upgrade when using DHCP](#1-host-ip-address-may-change-during-upgrade-when-using-dhcp). | |
| ::: | |
| :::info important | |
| Host IP addresses configured via DHCP may change during upgrades. This prevents the cluster from starting correctly and requires manual recovery steps. For details, see [Host IP address may change during upgrade when using DHCP](#1-host-ip-address-may-change-during-upgrade-when-using-dhcp). | |
| ::: |
Problem:
If you are using DHCP to configure your host IP addresses, it's possible the IP addresses may change during upgrade to v1.7.0, which will prevent the cluster from starting correctly. This requires manual intervention to remedy.
Solution:
The IP address change can be fixed by editing the generated NetworkManager connection profile to include the old (pre-upgrade) DHCP client ID. Note that the example files included in the docs here are deliberately shortened (lines removed indicated by
...) to try to highlight only the relevant parts.Related Issue(s):
harvester/harvester#9260
Test plan:
See reproducer in harvester/harvester#9260. Try applying the workaround as documented here in that environment. It should work :-)