Skip to content

Conversation

w13915984028
Copy link
Member

@w13915984028 w13915984028 commented Aug 13, 2025

How VM is marked as non-migratable
How VM is processed on upgrade process
How VM is processed on node maintenance process
How migrate menu comes
How hot-plug works with live-migration

Problem:

Live-migrate invloves upgarde, node-maintenance, vm-creation, vm operation, hot-plug and more.

Documents are not always up-to-date and cross-referred.

Solution:

Update the document.

Related Issue(s):

harvester/harvester#8823

Test plan:

Additional documentation or context

Refer
https://github.com/harvester/harvester/blob/b9a3dd4b2b5ede2648b4d76e339c436bff1aa987/pkg/util/virtualmachineinstance/virtualmachineinstance.go#L17 used by upgrade

and
https://github.com/harvester/harvester/blob/b9a3dd4b2b5ede2648b4d76e339c436bff1aa987/pkg/util/virtualmachineinstance/virtualmachineinstance.go#L70 used by canMigrate

there are minor differences.

Issue harvester/harvester#7128 is also updated on this PR

Copy link

github-actions bot commented Aug 13, 2025

Name Link
🔨 Latest commit 5b9940c
😎 Deploy Preview https://68a8306eb48f7600a92289d9--harvester-preview.netlify.app

martindekov
martindekov previously approved these changes Aug 13, 2025
Copy link

@martindekov martindekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits, overall LGTM!

@w13915984028 w13915984028 force-pushed the doc8823 branch 3 times, most recently from 7c1b590 to 91034b9 Compare August 13, 2025 15:18
@w13915984028
Copy link
Member Author

To reviewers:

As the issue harvester/harvester#8823 mentioned, live-migration is related to so many concepts on Harvester, I tried to link them but might sitll miss some, please help review and add your comments, thanks.

Copy link
Contributor

@ihcsim ihcsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good content. I think, if possible, we should concentrate the info on the upgrade and live migration pages. Specifically, how upgrade can potentially disrupt some workloads, what Harvester is doing to help mitigate these disruptions and if things go wrong, what can the users do.

Right now, I am not sure if we need to repeat live migration info in the vm, volume, storage class pages. I'll need to re-read it again.

Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@w13915984028, The picture link seems wrong, I saw below error:
Error: Image static/img/v1.6/vm/batch-migrations.png used in docs/vm/live-migration.md not found.

brandboat
brandboat previously approved these changes Aug 14, 2025
Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, thanks!

FrankYang0529
FrankYang0529 previously approved these changes Aug 14, 2025
Copy link
Contributor

@jillian-maroket jillian-maroket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review done

Copy link
Contributor

@ihcsim ihcsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - a few suggestions and nits to help clarify the context and intention.

@w13915984028
Copy link
Member Author

w13915984028 commented Aug 21, 2025

I want to better understand why user needs to know the difference between system-initiated (upgrade, hot-plug etc.) and user-initiated (UI, kubectl, maintenance) migration. Are there certain things they can or cannot do with each?

@ihcsim I try to breakdown the 4 migrations on v1.6.0:

(1) Migrate menu: straightforward, the reverse operation is Abort Migration
(2) hot-plug: user click edit config on VM, and then apply, after that, there is no clear information about the progress & result of hot-plug, what user can see is that the VM is migrating.
(3) node maintenance: after put node to maintenance, there is also no clear information about the progress of maintenance before the node finally enter maintenance, if it is stucking, check Harvester pod is the solution; refer: harvester/harvester#8856
(4) upgrade: the most visible info is that some node is upgrading, but no further info like node upgrade is on migrating vms stage, there are still n vms to be migrated... unless check Harvester log; there had been issue that user aborted the migration triggered by upgrade.

some users are keen to know those details, the doc helps us from explaining it now and then; and even Harvester engineers can benefit from those details.

(2)(3)(4) need improvements more than harvester/harvester#8924 and this doc PR (it also can't cover all cases atm. like this open question #849 (comment))

suppose there is a confirm pop-up window when user clicks Abort Migration saying
this migration is triggered by node-maintenance, are you sure to abort?
it is much better.
harvester/harvester#8943

Copy link
Contributor

@jillian-maroket jillian-maroket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second review done.

@ihcsim I agree that we are getting into the weeds here. If folks decide that such info must be added to the doc, we can store the info in pages dedicated to reference content. Users shouldn't be forced to wade through implementation details when they're simply trying to complete tasks. This approach will also help us tame the out-of-control linking that makes the doc difficult to maintain.


:::tip
Create a backup or snapshot for each non-migratable VM before modifying the settings that bind it to the node that you want to remove.
- Create a backup or snapshot for each non-migratable virtual machine.
- Change the virtual machines to let them run on other nodes as more as possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any missing words here? What do you mean by 'change the virtual machines'?

Copy link
Member Author

@w13915984028 w13915984028 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'change the virtual machines'

Yes, it is not fully expressed, as non-migratable vm mentioned:

User can try to remove the node specific volumes/devices, remove the node selector ... to finally let vm be scheduled to other nodes if possible. As current node is to be deleted, the un-saved/un-migrated VM will be lost.

Comment on lines 145 to 147
- The **Abort Migration** menu item is available when the virtual machine already has a running or pending migration process.

- Don't click `Abort Migration` if it is created by the [batch-migrations](#automatically-triggered-batch-migrations). See [The `VirtualMachineInstanceMigration` Object](#the-virtualmachineinstancemigration-object) for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The **Abort Migration** menu item is available when the virtual machine already has a running or pending migration process.
- Don't click `Abort Migration` if it is created by the [batch-migrations](#automatically-triggered-batch-migrations). See [The `VirtualMachineInstanceMigration` Object](#the-virtualmachineinstancemigration-object) for more details.
The **Abort Migration** menu item is already running or has a pending migration process.
Do not use this UI feature if the migration process was created using [batch migration](#automatically-triggered-batch-migration). For more information, see [`VirtualMachineInstanceMigration` Object](#virtualmachineinstancemigration-object).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Abort Migration menu item is already running or has a pending migration process.

This description is not accurate, it needs to express below information:

When the VM has already a related virtualmachineinstancemigration object (created by Migrate menu or batch-migration or others) and the latter is in state running or pending (it has other state like failed, successful, aborted), then the VM has the menu Abort Migration.

@w13915984028
Copy link
Member Author

Sorry I can't agree with the point that the document is over complexed.

In the long time of customer help, troubleshooting, community coordinator..., what I heared is that the document is missing details, out-dated or even wrong, but very rare compalins saying the document is too complex.

@w13915984028 w13915984028 requested a review from votdev August 22, 2025 06:50
@w13915984028
Copy link
Member Author

@votdev You will be online next week, please also help review and check if the PR meets the expectations on issue harvester/harvester#8823, thanks.

  How VM is marked as non-migratable
  How VM is processed on upgrade process
  How VM is processed on node maintenance process
  How `migrate` menu comes

Signed-off-by: Jian Wang <[email protected]>
Signed-off-by: Jian Wang <[email protected]>
Signed-off-by: Jian Wang <[email protected]>
Signed-off-by: Jian Wang <[email protected]>
Signed-off-by: Jian Wang <[email protected]>
Signed-off-by: Jian Wang <[email protected]>
@w13915984028
Copy link
Member Author

FYI:
The PR is rebased with master-head today, and there are last two open questions:
#849 (comment)
#849 (comment)

@ihcsim
Copy link
Contributor

ihcsim commented Aug 22, 2025

@w13915984028 I think the word "bloated" is more aligned with what Jillian and I are trying to communicate about the "How Migration Works" section. It's not "complex" especially, after we spent the time to help you to clarify the wording to express your intention in the doc.

We think that when a user reads the "Live Migration" documentation, their immediate goal is to use live migration, not to understand or troubleshoot live migration in-depth. Hence, we recommended moving the content somewhere else. For example, can we have a new architecture page for live migration, can we move the "How Migration Works" section to the lower half of the same page etc.? We didn't say remove it completely. But since you are committed to the current layout, I said in one of my comments above that you can keep it as-is.

I hope this makes sense.

@jillian-maroket
Copy link
Contributor

If anybody wants to approve the PR so it can be merged, please go ahead. cc: @ihcsim @Vicente-Cheng

Copy link
Member

@votdev votdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@w13915984028 w13915984028 merged commit d2c47ee into harvester:main Aug 26, 2025
4 checks passed
@w13915984028
Copy link
Member Author

w13915984028 commented Aug 26, 2025

Thanks all reviewers. It has taken such a long time to write and review. I merged the PR for now,

The massive details might not meeting all requirements/formatting/expectations from all of us, but we also believe those details are helpful to end users.

Let's keep enhancing the doc. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants