Fix creating cluster when using harvester cluster name instead of id#101
Fix creating cluster when using harvester cluster name instead of id#101iamtrazy wants to merge 2 commits into
Conversation
📝 WalkthroughSummaryFixes Harvester cluster provisioning by ensuring the Rancher v2 cluster lookup uses the Harvester cluster ID rather than the cluster name. Changes
File modified: modules/workloads/k8s-cluster/main.tf WalkthroughThe PR adds a new Rancher project role template Sequence Diagram(s)sequenceDiagram
participant User
participant Terraform
participant Rancher
participant HarvesterCluster
User->>Terraform: apply with vm/storage VLANs & groups
Terraform->>Rancher: create role template (project-member-restricted)
Terraform->>Rancher: create namespace, harvester_network.vm/storage
Terraform->>Rancher: create project role bindings for shared images
Terraform->>HarvesterCluster: lookup cluster by upstream ID
Terraform->>Nodes: render cloud-init with enable_storage_netplan (when applicable)
Nodes->>Nodes: netplan generate / apply (conditional)
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/management/tenant-space/main.tf (1)
136-169:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject configs that set both
vlan_idandvm_network_vlan_id.These two paths create VM
harvester_networkresources in the same namespace and use the same default naming convention. A caller that keepsvlan_idpopulated while adopting the new single-VLAN input can end up with duplicate or conflicting network creation at apply time. Add a module precondition to make the inputs mutually exclusive.Suggested change
lifecycle { ignore_changes = [container_resource_limit] + precondition { + condition = var.vm_network_vlan_id == null || var.vlan_id == null || length(var.vlan_id) == 0 + error_message = "vm_network_vlan_id and vlan_id are mutually exclusive. Use vm_network_vlan_id for the simple single-VLAN path, or vlan_id for the legacy/multi-VLAN path." + } precondition { condition = var.cpu_limit != null || alltrue([ var.memory_limit == null,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modules/management/tenant-space/main.tf` around lines 136 - 169, Add a validation/precondition that rejects configs where both inputs are set by asserting that not both var.vlan_id and var.vm_network_vlan_id are non-null; e.g., add a validation block (on var.vm_network_vlan_id or as a top-level precondition) with condition = !(var.vlan_id != null && var.vm_network_vlan_id != null) and a clear error_message explaining the mutual-exclusion, so the module never tries to create overlapping harvester_network resources (harvester_network.tenant and harvester_network.vm).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modules/management/cluster-roles/main.tf`:
- Around line 385-438: The role currently allows creating namespaces (the rules
block with api_groups = [""] and resources = ["namespaces"]) but lacks read-only
visibility for namespaces and project objects; add two rules: one granting
get/list/watch on core namespaces (api_groups = [""], resources =
["namespaces"], verbs = ["get","list","watch"]) and one granting get/list/watch
on projects in management.cattle.io (api_groups = ["management.cattle.io"],
resources = ["projects"], verbs = ["get","list","watch"]) so this role can act
as a drop-in replacement for project-member; place them alongside the other
rules blocks (e.g., near the existing namespaces create rule and the
clusters/projects rules).
In `@modules/management/tenant-space/variables.tf`:
- Around line 207-214: The variable enable_shared_image_access currently
defaults to true which causes existing tenant spaces to automatically gain
shared-image bindings on apply; change its default to false in the variables.tf
variable block for enable_shared_image_access and update the variable
description to reflect the new default (opt-in behavior) so existing tenants are
not affected unless they explicitly enable it.
In `@modules/workloads/k8s-cluster/main.tf`:
- Around line 18-35: The generated user_data currently computes a single global
local._inject_storage_netplan and _generated_node_user_data which gets applied
to all pools; change the template rendering to happen per-pool so
enable_storage_netplan is computed from each.value.storage_network and the
global user_data check. Specifically, remove usage of the global
_generated_node_user_data for pools and instead call
templatefile("${path.module}/templates/node-cloud-init.tpl", { ...,
enable_storage_netplan = local._using_generated_user_data &&
(each.value.storage_network != null) }) inside the resource/for_each that
iterates var.machine_pools (use each.value and each.key), and ensure
var.user_data still takes precedence (skip template when var.user_data is
non-empty). This keeps var.user_data, var.node_password,
var.ssh_authorized_keys, var.ntp_server, ssh_user etc. passed through per-pool.
In `@modules/workloads/k8s-cluster/templates/node-cloud-init.tpl`:
- Around line 46-49: The template currently hardcodes the storage interface as
"enp2s0" causing wrong NIC selection; change the template to render the storage
interface name dynamically by accepting a template variable (e.g.,
storage_interface or storage_iface_index) from Terraform and using that variable
in place of "enp2s0" in the node-cloud-init.tpl block where enp2s0 is
referenced; update the Terraform module that calls this template to compute/pass
the correct interface name or index (based on vm_network/networks) into that
variable so the rendered cloud-init uses the correct storage NIC.
In `@modules/workloads/k8s-cluster/variables.tf`:
- Around line 108-110: Add validation to the variables describing network refs
so empty strings and duplicates are rejected: for vm_network, storage_network
and networks (the variables vm_network, storage_network, networks used by
machine_pools) add a validation block that (1) ensures any provided string is
non-empty and any provided list element is non-empty (use checks on each value),
and (2) builds the combined list via concat(vm_network if set, networks,
storage_network if set) then uses compact/trim and distinct to assert
length(distinct(compact(...))) == length(compact(...)) to enforce uniqueness;
update the machine_pools input validation to reference the same rule so repeated
or empty network refs are rejected before downstream interface creation.
---
Outside diff comments:
In `@modules/management/tenant-space/main.tf`:
- Around line 136-169: Add a validation/precondition that rejects configs where
both inputs are set by asserting that not both var.vlan_id and
var.vm_network_vlan_id are non-null; e.g., add a validation block (on
var.vm_network_vlan_id or as a top-level precondition) with condition =
!(var.vlan_id != null && var.vm_network_vlan_id != null) and a clear
error_message explaining the mutual-exclusion, so the module never tries to
create overlapping harvester_network resources (harvester_network.tenant and
harvester_network.vm).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 77da7fd0-f87e-4b7e-b922-85fd12b85996
📒 Files selected for processing (7)
modules/management/cluster-roles/main.tfmodules/management/cluster-roles/outputs.tfmodules/management/tenant-space/main.tfmodules/management/tenant-space/variables.tfmodules/workloads/k8s-cluster/main.tfmodules/workloads/k8s-cluster/templates/node-cloud-init.tplmodules/workloads/k8s-cluster/variables.tf
| rules { | ||
| api_groups = [""] | ||
| resources = ["namespaces"] | ||
| verbs = ["create"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = [""] | ||
| resources = ["persistentvolumes"] | ||
| verbs = ["get", "list", "watch"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = ["storage.k8s.io"] | ||
| resources = ["storageclasses"] | ||
| verbs = ["get", "list", "watch"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = ["apiregistration.k8s.io"] | ||
| resources = ["apiservices"] | ||
| verbs = ["get", "list", "watch"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = [""] | ||
| resources = ["persistentvolumeclaims"] | ||
| verbs = ["*"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = ["metrics.k8s.io"] | ||
| resources = ["pods"] | ||
| verbs = ["*"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = ["management.cattle.io"] | ||
| resources = ["clusterevents"] | ||
| verbs = ["get", "list", "watch"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = ["catalog.cattle.io"] | ||
| resources = ["clusterrepos", "operations", "releases", "apps"] | ||
| verbs = ["get", "list", "watch"] | ||
| } | ||
|
|
||
| rules { | ||
| api_groups = ["management.cattle.io"] | ||
| resources = ["clusters"] | ||
| resource_names = ["local"] | ||
| verbs = ["get"] | ||
| } |
There was a problem hiding this comment.
Add read-only project and namespace visibility to this replacement role.
This role can create namespaces, but it cannot read namespaces or the project object itself. That makes it an incomplete drop-in replacement for project-member and can block basic project visibility for users bound only to this role.
Proposed fix
rules {
api_groups = [""]
resources = ["namespaces"]
- verbs = ["create"]
+ verbs = ["get", "list", "watch", "create"]
}
+
+ rules {
+ api_groups = ["management.cattle.io"]
+ resources = ["projects"]
+ verbs = ["get", "list", "watch"]
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rules { | |
| api_groups = [""] | |
| resources = ["namespaces"] | |
| verbs = ["create"] | |
| } | |
| rules { | |
| api_groups = [""] | |
| resources = ["persistentvolumes"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["storage.k8s.io"] | |
| resources = ["storageclasses"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["apiregistration.k8s.io"] | |
| resources = ["apiservices"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = [""] | |
| resources = ["persistentvolumeclaims"] | |
| verbs = ["*"] | |
| } | |
| rules { | |
| api_groups = ["metrics.k8s.io"] | |
| resources = ["pods"] | |
| verbs = ["*"] | |
| } | |
| rules { | |
| api_groups = ["management.cattle.io"] | |
| resources = ["clusterevents"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["catalog.cattle.io"] | |
| resources = ["clusterrepos", "operations", "releases", "apps"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["management.cattle.io"] | |
| resources = ["clusters"] | |
| resource_names = ["local"] | |
| verbs = ["get"] | |
| } | |
| rules { | |
| api_groups = [""] | |
| resources = ["namespaces"] | |
| verbs = ["get", "list", "watch", "create"] | |
| } | |
| rules { | |
| api_groups = ["management.cattle.io"] | |
| resources = ["projects"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = [""] | |
| resources = ["persistentvolumes"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["storage.k8s.io"] | |
| resources = ["storageclasses"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["apiregistration.k8s.io"] | |
| resources = ["apiservices"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = [""] | |
| resources = ["persistentvolumeclaims"] | |
| verbs = ["*"] | |
| } | |
| rules { | |
| api_groups = ["metrics.k8s.io"] | |
| resources = ["pods"] | |
| verbs = ["*"] | |
| } | |
| rules { | |
| api_groups = ["management.cattle.io"] | |
| resources = ["clusterevents"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["catalog.cattle.io"] | |
| resources = ["clusterrepos", "operations", "releases", "apps"] | |
| verbs = ["get", "list", "watch"] | |
| } | |
| rules { | |
| api_groups = ["management.cattle.io"] | |
| resources = ["clusters"] | |
| resource_names = ["local"] | |
| verbs = ["get"] | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/management/cluster-roles/main.tf` around lines 385 - 438, The role
currently allows creating namespaces (the rules block with api_groups = [""] and
resources = ["namespaces"]) but lacks read-only visibility for namespaces and
project objects; add two rules: one granting get/list/watch on core namespaces
(api_groups = [""], resources = ["namespaces"], verbs = ["get","list","watch"])
and one granting get/list/watch on projects in management.cattle.io (api_groups
= ["management.cattle.io"], resources = ["projects"], verbs =
["get","list","watch"]) so this role can act as a drop-in replacement for
project-member; place them alongside the other rules blocks (e.g., near the
existing namespaces create rule and the clusters/projects rules).
| # Backwards-compatible: defaults to false, so existing tenant spaces are | ||
| # completely unaffected unless they explicitly opt in. | ||
|
|
||
| variable "enable_shared_image_access" { | ||
| type = bool | ||
| description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)." | ||
| default = true | ||
| } |
There was a problem hiding this comment.
Keep shared image access opt-in.
Defaulting this to true makes existing tenants with group_role_bindings pick up new shared-image bindings on the next apply, which is a behavior change rather than an optional feature. Set the default to false, or document this as an intentional breaking change.
Suggested change
variable "enable_shared_image_access" {
type = bool
description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)."
- default = true
+ default = false
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Backwards-compatible: defaults to false, so existing tenant spaces are | |
| # completely unaffected unless they explicitly opt in. | |
| variable "enable_shared_image_access" { | |
| type = bool | |
| description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)." | |
| default = true | |
| } | |
| # Backwards-compatible: defaults to false, so existing tenant spaces are | |
| # completely unaffected unless they explicitly opt in. | |
| variable "enable_shared_image_access" { | |
| type = bool | |
| description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)." | |
| default = false | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/management/tenant-space/variables.tf` around lines 207 - 214, The
variable enable_shared_image_access currently defaults to true which causes
existing tenant spaces to automatically gain shared-image bindings on apply;
change its default to false in the variables.tf variable block for
enable_shared_image_access and update the variable description to reflect the
new default (opt-in behavior) so existing tenants are not affected unless they
explicitly enable it.
| # Inject the storage interface netplan (enp2s0 / use-routes: false) when: | ||
| # - at least one pool declares a storage_network, AND | ||
| # - the caller is using generated user_data (not an explicit user_data string). | ||
| # Callers passing explicit user_data are responsible for the netplan themselves. | ||
| _using_generated_user_data = var.user_data == null || var.user_data == "" | ||
| _any_pool_has_storage = anytrue([for p in var.machine_pools : p.storage_network != null]) | ||
| _inject_storage_netplan = local._any_pool_has_storage && local._using_generated_user_data | ||
|
|
||
| # Generate node cloud-init from first-class variables when user_data is not provided. | ||
| # user_data (non-empty string) takes full precedence — generation is skipped entirely. | ||
| _generated_node_user_data = (var.node_password != null || length(var.ssh_authorized_keys) > 0 || var.ntp_server != "") ? templatefile( | ||
| _generated_node_user_data = (var.node_password != null || length(var.ssh_authorized_keys) > 0 || var.ntp_server != "" || local._inject_storage_netplan) ? templatefile( | ||
| "${path.module}/templates/node-cloud-init.tpl", | ||
| { | ||
| ssh_user = var.ssh_user | ||
| node_password = var.node_password | ||
| ssh_authorized_keys = var.ssh_authorized_keys | ||
| ntp_server = var.ntp_server | ||
| ssh_user = var.ssh_user | ||
| node_password = var.node_password | ||
| ssh_authorized_keys = var.ssh_authorized_keys | ||
| ntp_server = var.ntp_server | ||
| enable_storage_netplan = local._inject_storage_netplan |
There was a problem hiding this comment.
Generate storage netplan flags per pool, not globally.
Line 23-Line 35 computes one global _inject_storage_netplan, and Line 127 applies the resulting user_data to every pool. In mixed configurations, pools without storage_network still receive storage netplan settings. Move template rendering to pool scope (each.value) so enable_storage_netplan is evaluated per pool.
Also applies to: 127-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/workloads/k8s-cluster/main.tf` around lines 18 - 35, The generated
user_data currently computes a single global local._inject_storage_netplan and
_generated_node_user_data which gets applied to all pools; change the template
rendering to happen per-pool so enable_storage_netplan is computed from
each.value.storage_network and the global user_data check. Specifically, remove
usage of the global _generated_node_user_data for pools and instead call
templatefile("${path.module}/templates/node-cloud-init.tpl", { ...,
enable_storage_netplan = local._using_generated_user_data &&
(each.value.storage_network != null) }) inside the resource/for_each that
iterates var.machine_pools (use each.value and each.key), and ensure
var.user_data still takes precedence (skip template when var.user_data is
non-empty). This keeps var.user_data, var.node_password,
var.ssh_authorized_keys, var.ntp_server, ssh_user etc. passed through per-pool.
| enp2s0: | ||
| dhcp4: true | ||
| dhcp4-overrides: | ||
| use-routes: false |
There was a problem hiding this comment.
Avoid hardcoding storage NIC as enp2s0.
Line 46 assumes the storage interface is always enp2s0, but interface position can vary with vm_network/networks combinations. Pass the storage interface name (or computed index) from Terraform and render it dynamically in this template.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/workloads/k8s-cluster/templates/node-cloud-init.tpl` around lines 46
- 49, The template currently hardcodes the storage interface as "enp2s0" causing
wrong NIC selection; change the template to render the storage interface name
dynamically by accepting a template variable (e.g., storage_interface or
storage_iface_index) from Terraform and using that variable in place of "enp2s0"
in the node-cloud-init.tpl block where enp2s0 is referenced; update the
Terraform module that calls this template to compute/pass the correct interface
name or index (based on vm_network/networks) into that variable so the rendered
cloud-init uses the correct storage NIC.
| vm_network = optional(string) # primary VM network ref e.g. "kasun-test-net/kasun-test-vlan601" | ||
| storage_network = optional(string) # storage network ref e.g. "kasun-test-net/kasun-test-strg-vlan698" | ||
| networks = optional(list(string), []) # additional or legacy full network list |
There was a problem hiding this comment.
Validate network refs in machine_pools inputs.
Line 108-Line 110 accepts empty and repeated network references. Add validation to require non-empty values when set and uniqueness across [vm_network, ...networks, storage_network] to prevent invalid interface definitions downstream.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/workloads/k8s-cluster/variables.tf` around lines 108 - 110, Add
validation to the variables describing network refs so empty strings and
duplicates are rejected: for vm_network, storage_network and networks (the
variables vm_network, storage_network, networks used by machine_pools) add a
validation block that (1) ensures any provided string is non-empty and any
provided list element is non-empty (use checks on each value), and (2) builds
the combined list via concat(vm_network if set, networks, storage_network if
set) then uses compact/trim and distinct to assert
length(distinct(compact(...))) == length(compact(...)) to enforce uniqueness;
update the machine_pools input validation to reference the same rule so repeated
or empty network refs are rejected before downstream interface creation.
Purpose
This addresses a bug when using
harvester_cluster_nameto create a cluster which should passdata.rancher2_cluster.harvester[0].idinstead ofdata.rancher2_cluster.harvester[0].nameto correctly pass the id to therancher2_cluster_v2data source