Skip to content

Fix creating cluster when using harvester cluster name instead of id#101

Open
iamtrazy wants to merge 2 commits into
wso2:mainfrom
iamtrazy:main
Open

Fix creating cluster when using harvester cluster name instead of id#101
iamtrazy wants to merge 2 commits into
wso2:mainfrom
iamtrazy:main

Conversation

@iamtrazy
Copy link
Copy Markdown
Contributor

Purpose

This addresses a bug when using harvester_cluster_name to create a cluster which should pass data.rancher2_cluster.harvester[0].id instead of data.rancher2_cluster.harvester[0].name to correctly pass the id to the rancher2_cluster_v2 data source

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Review Change Stack

📝 Walkthrough

Summary

Fixes Harvester cluster provisioning by ensuring the Rancher v2 cluster lookup uses the Harvester cluster ID rather than the cluster name.

Changes

  • Updated modules/workloads/k8s-cluster/main.tf so data.rancher2_cluster_v2.harvester receives the upstream Harvester cluster ID (data.rancher2_cluster.harvester[0].id) when harvester_cluster_name is used, falling back to var.harvester_cluster_id otherwise. This ensures the correct cluster identifier is passed when creating cloud credentials and requesting the Harvester kubeconfig.

File modified: modules/workloads/k8s-cluster/main.tf

Walkthrough

The PR adds a new Rancher project role template project-member-restricted and exposes its ID. Tenant-space gains optional vm/storage VLAN inputs, conditional creation of a network namespace and two Harvester networks, and an optional shared-image read-only binding per group. The k8s-cluster module updates machine_pools to accept vm_network/storage_network, conditionally injects storage netplan into generated node cloud-init, and updates the Harvester cluster data lookup to query by upstream cluster ID.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Terraform
  participant Rancher
  participant HarvesterCluster
  User->>Terraform: apply with vm/storage VLANs & groups
  Terraform->>Rancher: create role template (project-member-restricted)
  Terraform->>Rancher: create namespace, harvester_network.vm/storage
  Terraform->>Rancher: create project role bindings for shared images
  Terraform->>HarvesterCluster: lookup cluster by upstream ID
  Terraform->>Nodes: render cloud-init with enable_storage_netplan (when applicable)
  Nodes->>Nodes: netplan generate / apply (conditional)
Loading

Suggested reviewers

  • HiranAdikari
  • gnudeep
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description addresses the immediate bug fix but lacks depth on broader changes. The changeset includes significant infrastructure additions (VM/storage networks, RBAC role, shared image access) that are not mentioned, suggesting the description may be incomplete or misleading about the full scope. Clarify whether the PR scope is limited to the harvester cluster ID fix, or expand the description to document the additional networking, RBAC, and tenant-space features being introduced alongside this bug fix.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the primary bug fix of the changeset: correcting cluster creation when using harvester cluster name by passing the cluster ID instead.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/management/tenant-space/main.tf (1)

136-169: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject configs that set both vlan_id and vm_network_vlan_id.

These two paths create VM harvester_network resources in the same namespace and use the same default naming convention. A caller that keeps vlan_id populated while adopting the new single-VLAN input can end up with duplicate or conflicting network creation at apply time. Add a module precondition to make the inputs mutually exclusive.

Suggested change
   lifecycle {
     ignore_changes = [container_resource_limit]
+    precondition {
+      condition     = var.vm_network_vlan_id == null || var.vlan_id == null || length(var.vlan_id) == 0
+      error_message = "vm_network_vlan_id and vlan_id are mutually exclusive. Use vm_network_vlan_id for the simple single-VLAN path, or vlan_id for the legacy/multi-VLAN path."
+    }
     precondition {
       condition = var.cpu_limit != null || alltrue([
         var.memory_limit == null,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/management/tenant-space/main.tf` around lines 136 - 169, Add a
validation/precondition that rejects configs where both inputs are set by
asserting that not both var.vlan_id and var.vm_network_vlan_id are non-null;
e.g., add a validation block (on var.vm_network_vlan_id or as a top-level
precondition) with condition = !(var.vlan_id != null && var.vm_network_vlan_id
!= null) and a clear error_message explaining the mutual-exclusion, so the
module never tries to create overlapping harvester_network resources
(harvester_network.tenant and harvester_network.vm).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modules/management/cluster-roles/main.tf`:
- Around line 385-438: The role currently allows creating namespaces (the rules
block with api_groups = [""] and resources = ["namespaces"]) but lacks read-only
visibility for namespaces and project objects; add two rules: one granting
get/list/watch on core namespaces (api_groups = [""], resources =
["namespaces"], verbs = ["get","list","watch"]) and one granting get/list/watch
on projects in management.cattle.io (api_groups = ["management.cattle.io"],
resources = ["projects"], verbs = ["get","list","watch"]) so this role can act
as a drop-in replacement for project-member; place them alongside the other
rules blocks (e.g., near the existing namespaces create rule and the
clusters/projects rules).

In `@modules/management/tenant-space/variables.tf`:
- Around line 207-214: The variable enable_shared_image_access currently
defaults to true which causes existing tenant spaces to automatically gain
shared-image bindings on apply; change its default to false in the variables.tf
variable block for enable_shared_image_access and update the variable
description to reflect the new default (opt-in behavior) so existing tenants are
not affected unless they explicitly enable it.

In `@modules/workloads/k8s-cluster/main.tf`:
- Around line 18-35: The generated user_data currently computes a single global
local._inject_storage_netplan and _generated_node_user_data which gets applied
to all pools; change the template rendering to happen per-pool so
enable_storage_netplan is computed from each.value.storage_network and the
global user_data check. Specifically, remove usage of the global
_generated_node_user_data for pools and instead call
templatefile("${path.module}/templates/node-cloud-init.tpl", { ...,
enable_storage_netplan = local._using_generated_user_data &&
(each.value.storage_network != null) }) inside the resource/for_each that
iterates var.machine_pools (use each.value and each.key), and ensure
var.user_data still takes precedence (skip template when var.user_data is
non-empty). This keeps var.user_data, var.node_password,
var.ssh_authorized_keys, var.ntp_server, ssh_user etc. passed through per-pool.

In `@modules/workloads/k8s-cluster/templates/node-cloud-init.tpl`:
- Around line 46-49: The template currently hardcodes the storage interface as
"enp2s0" causing wrong NIC selection; change the template to render the storage
interface name dynamically by accepting a template variable (e.g.,
storage_interface or storage_iface_index) from Terraform and using that variable
in place of "enp2s0" in the node-cloud-init.tpl block where enp2s0 is
referenced; update the Terraform module that calls this template to compute/pass
the correct interface name or index (based on vm_network/networks) into that
variable so the rendered cloud-init uses the correct storage NIC.

In `@modules/workloads/k8s-cluster/variables.tf`:
- Around line 108-110: Add validation to the variables describing network refs
so empty strings and duplicates are rejected: for vm_network, storage_network
and networks (the variables vm_network, storage_network, networks used by
machine_pools) add a validation block that (1) ensures any provided string is
non-empty and any provided list element is non-empty (use checks on each value),
and (2) builds the combined list via concat(vm_network if set, networks,
storage_network if set) then uses compact/trim and distinct to assert
length(distinct(compact(...))) == length(compact(...)) to enforce uniqueness;
update the machine_pools input validation to reference the same rule so repeated
or empty network refs are rejected before downstream interface creation.

---

Outside diff comments:
In `@modules/management/tenant-space/main.tf`:
- Around line 136-169: Add a validation/precondition that rejects configs where
both inputs are set by asserting that not both var.vlan_id and
var.vm_network_vlan_id are non-null; e.g., add a validation block (on
var.vm_network_vlan_id or as a top-level precondition) with condition =
!(var.vlan_id != null && var.vm_network_vlan_id != null) and a clear
error_message explaining the mutual-exclusion, so the module never tries to
create overlapping harvester_network resources (harvester_network.tenant and
harvester_network.vm).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 77da7fd0-f87e-4b7e-b922-85fd12b85996

📥 Commits

Reviewing files that changed from the base of the PR and between 600e171 and 34872bf.

📒 Files selected for processing (7)
  • modules/management/cluster-roles/main.tf
  • modules/management/cluster-roles/outputs.tf
  • modules/management/tenant-space/main.tf
  • modules/management/tenant-space/variables.tf
  • modules/workloads/k8s-cluster/main.tf
  • modules/workloads/k8s-cluster/templates/node-cloud-init.tpl
  • modules/workloads/k8s-cluster/variables.tf

Comment on lines +385 to +438
rules {
api_groups = [""]
resources = ["namespaces"]
verbs = ["create"]
}

rules {
api_groups = [""]
resources = ["persistentvolumes"]
verbs = ["get", "list", "watch"]
}

rules {
api_groups = ["storage.k8s.io"]
resources = ["storageclasses"]
verbs = ["get", "list", "watch"]
}

rules {
api_groups = ["apiregistration.k8s.io"]
resources = ["apiservices"]
verbs = ["get", "list", "watch"]
}

rules {
api_groups = [""]
resources = ["persistentvolumeclaims"]
verbs = ["*"]
}

rules {
api_groups = ["metrics.k8s.io"]
resources = ["pods"]
verbs = ["*"]
}

rules {
api_groups = ["management.cattle.io"]
resources = ["clusterevents"]
verbs = ["get", "list", "watch"]
}

rules {
api_groups = ["catalog.cattle.io"]
resources = ["clusterrepos", "operations", "releases", "apps"]
verbs = ["get", "list", "watch"]
}

rules {
api_groups = ["management.cattle.io"]
resources = ["clusters"]
resource_names = ["local"]
verbs = ["get"]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add read-only project and namespace visibility to this replacement role.

This role can create namespaces, but it cannot read namespaces or the project object itself. That makes it an incomplete drop-in replacement for project-member and can block basic project visibility for users bound only to this role.

Proposed fix
   rules {
     api_groups = [""]
     resources  = ["namespaces"]
-    verbs      = ["create"]
+    verbs      = ["get", "list", "watch", "create"]
   }
+
+  rules {
+    api_groups = ["management.cattle.io"]
+    resources  = ["projects"]
+    verbs      = ["get", "list", "watch"]
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rules {
api_groups = [""]
resources = ["namespaces"]
verbs = ["create"]
}
rules {
api_groups = [""]
resources = ["persistentvolumes"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["storage.k8s.io"]
resources = ["storageclasses"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["apiregistration.k8s.io"]
resources = ["apiservices"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = [""]
resources = ["persistentvolumeclaims"]
verbs = ["*"]
}
rules {
api_groups = ["metrics.k8s.io"]
resources = ["pods"]
verbs = ["*"]
}
rules {
api_groups = ["management.cattle.io"]
resources = ["clusterevents"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["catalog.cattle.io"]
resources = ["clusterrepos", "operations", "releases", "apps"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["management.cattle.io"]
resources = ["clusters"]
resource_names = ["local"]
verbs = ["get"]
}
rules {
api_groups = [""]
resources = ["namespaces"]
verbs = ["get", "list", "watch", "create"]
}
rules {
api_groups = ["management.cattle.io"]
resources = ["projects"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = [""]
resources = ["persistentvolumes"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["storage.k8s.io"]
resources = ["storageclasses"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["apiregistration.k8s.io"]
resources = ["apiservices"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = [""]
resources = ["persistentvolumeclaims"]
verbs = ["*"]
}
rules {
api_groups = ["metrics.k8s.io"]
resources = ["pods"]
verbs = ["*"]
}
rules {
api_groups = ["management.cattle.io"]
resources = ["clusterevents"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["catalog.cattle.io"]
resources = ["clusterrepos", "operations", "releases", "apps"]
verbs = ["get", "list", "watch"]
}
rules {
api_groups = ["management.cattle.io"]
resources = ["clusters"]
resource_names = ["local"]
verbs = ["get"]
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/management/cluster-roles/main.tf` around lines 385 - 438, The role
currently allows creating namespaces (the rules block with api_groups = [""] and
resources = ["namespaces"]) but lacks read-only visibility for namespaces and
project objects; add two rules: one granting get/list/watch on core namespaces
(api_groups = [""], resources = ["namespaces"], verbs = ["get","list","watch"])
and one granting get/list/watch on projects in management.cattle.io (api_groups
= ["management.cattle.io"], resources = ["projects"], verbs =
["get","list","watch"]) so this role can act as a drop-in replacement for
project-member; place them alongside the other rules blocks (e.g., near the
existing namespaces create rule and the clusters/projects rules).

Comment on lines +207 to +214
# Backwards-compatible: defaults to false, so existing tenant spaces are
# completely unaffected unless they explicitly opt in.

variable "enable_shared_image_access" {
type = bool
description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)."
default = true
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep shared image access opt-in.

Defaulting this to true makes existing tenants with group_role_bindings pick up new shared-image bindings on the next apply, which is a behavior change rather than an optional feature. Set the default to false, or document this as an intentional breaking change.

Suggested change
 variable "enable_shared_image_access" {
   type        = bool
   description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)."
-  default     = true
+  default     = false
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Backwards-compatible: defaults to false, so existing tenant spaces are
# completely unaffected unless they explicitly opt in.
variable "enable_shared_image_access" {
type = bool
description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)."
default = true
}
# Backwards-compatible: defaults to false, so existing tenant spaces are
# completely unaffected unless they explicitly opt in.
variable "enable_shared_image_access" {
type = bool
description = "When true (default), creates a Rancher read-only project role binding for each unique group in group_role_bindings to the shared images project. Set to false only for tenant spaces that should not see the shared image catalogue (e.g. the shared space itself)."
default = false
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/management/tenant-space/variables.tf` around lines 207 - 214, The
variable enable_shared_image_access currently defaults to true which causes
existing tenant spaces to automatically gain shared-image bindings on apply;
change its default to false in the variables.tf variable block for
enable_shared_image_access and update the variable description to reflect the
new default (opt-in behavior) so existing tenants are not affected unless they
explicitly enable it.

Comment on lines +18 to +35
# Inject the storage interface netplan (enp2s0 / use-routes: false) when:
# - at least one pool declares a storage_network, AND
# - the caller is using generated user_data (not an explicit user_data string).
# Callers passing explicit user_data are responsible for the netplan themselves.
_using_generated_user_data = var.user_data == null || var.user_data == ""
_any_pool_has_storage = anytrue([for p in var.machine_pools : p.storage_network != null])
_inject_storage_netplan = local._any_pool_has_storage && local._using_generated_user_data

# Generate node cloud-init from first-class variables when user_data is not provided.
# user_data (non-empty string) takes full precedence — generation is skipped entirely.
_generated_node_user_data = (var.node_password != null || length(var.ssh_authorized_keys) > 0 || var.ntp_server != "") ? templatefile(
_generated_node_user_data = (var.node_password != null || length(var.ssh_authorized_keys) > 0 || var.ntp_server != "" || local._inject_storage_netplan) ? templatefile(
"${path.module}/templates/node-cloud-init.tpl",
{
ssh_user = var.ssh_user
node_password = var.node_password
ssh_authorized_keys = var.ssh_authorized_keys
ntp_server = var.ntp_server
ssh_user = var.ssh_user
node_password = var.node_password
ssh_authorized_keys = var.ssh_authorized_keys
ntp_server = var.ntp_server
enable_storage_netplan = local._inject_storage_netplan
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Generate storage netplan flags per pool, not globally.

Line 23-Line 35 computes one global _inject_storage_netplan, and Line 127 applies the resulting user_data to every pool. In mixed configurations, pools without storage_network still receive storage netplan settings. Move template rendering to pool scope (each.value) so enable_storage_netplan is evaluated per pool.

Also applies to: 127-127

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/workloads/k8s-cluster/main.tf` around lines 18 - 35, The generated
user_data currently computes a single global local._inject_storage_netplan and
_generated_node_user_data which gets applied to all pools; change the template
rendering to happen per-pool so enable_storage_netplan is computed from
each.value.storage_network and the global user_data check. Specifically, remove
usage of the global _generated_node_user_data for pools and instead call
templatefile("${path.module}/templates/node-cloud-init.tpl", { ...,
enable_storage_netplan = local._using_generated_user_data &&
(each.value.storage_network != null) }) inside the resource/for_each that
iterates var.machine_pools (use each.value and each.key), and ensure
var.user_data still takes precedence (skip template when var.user_data is
non-empty). This keeps var.user_data, var.node_password,
var.ssh_authorized_keys, var.ntp_server, ssh_user etc. passed through per-pool.

Comment on lines +46 to +49
enp2s0:
dhcp4: true
dhcp4-overrides:
use-routes: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid hardcoding storage NIC as enp2s0.

Line 46 assumes the storage interface is always enp2s0, but interface position can vary with vm_network/networks combinations. Pass the storage interface name (or computed index) from Terraform and render it dynamically in this template.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/workloads/k8s-cluster/templates/node-cloud-init.tpl` around lines 46
- 49, The template currently hardcodes the storage interface as "enp2s0" causing
wrong NIC selection; change the template to render the storage interface name
dynamically by accepting a template variable (e.g., storage_interface or
storage_iface_index) from Terraform and using that variable in place of "enp2s0"
in the node-cloud-init.tpl block where enp2s0 is referenced; update the
Terraform module that calls this template to compute/pass the correct interface
name or index (based on vm_network/networks) into that variable so the rendered
cloud-init uses the correct storage NIC.

Comment on lines +108 to +110
vm_network = optional(string) # primary VM network ref e.g. "kasun-test-net/kasun-test-vlan601"
storage_network = optional(string) # storage network ref e.g. "kasun-test-net/kasun-test-strg-vlan698"
networks = optional(list(string), []) # additional or legacy full network list
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate network refs in machine_pools inputs.

Line 108-Line 110 accepts empty and repeated network references. Add validation to require non-empty values when set and uniqueness across [vm_network, ...networks, storage_network] to prevent invalid interface definitions downstream.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/workloads/k8s-cluster/variables.tf` around lines 108 - 110, Add
validation to the variables describing network refs so empty strings and
duplicates are rejected: for vm_network, storage_network and networks (the
variables vm_network, storage_network, networks used by machine_pools) add a
validation block that (1) ensures any provided string is non-empty and any
provided list element is non-empty (use checks on each value), and (2) builds
the combined list via concat(vm_network if set, networks, storage_network if
set) then uses compact/trim and distinct to assert
length(distinct(compact(...))) == length(compact(...)) to enforce uniqueness;
update the machine_pools input validation to reference the same rule so repeated
or empty network refs are rejected before downstream interface creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant