Skip to content

Fail NodePublishVolume with actionable error in case Workload Identity is not yet enabled on node level #533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

hungnguyen243
Copy link
Contributor

@hungnguyen243 hungnguyen243 commented Mar 27, 2025

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
Add a check for GKE metadata server label on node to fail MountVolume (NodePublishVolume) with an actionable error. The target scenario is when an user enabled Workload Identity on an existing cluster, but didn't update their node pool to use GKE metadata server. Currently, this error takes more effort to debug since NodePublishVolume still succeeds, but we would see R/W errors from the sidecar due to PermissionDenied as the default SA lacks the appropriate permission for the GCS bucket.

Testing documentation with custom driver image: https://paste.googleplex.com/5532151901323264

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fail MountVolume with actionable error in case Workload Identity is not yet enabled on node level

…ith actionable error in case WI is not yet enabled on node level
nodeObj.Spec = corev1.NodeSpec{}
nodeObj.Status = corev1.NodeStatus{}
nodeObj.ObjectMeta.Annotations = nil
nodeObj.ObjectMeta.Labels = newLabels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions, why do we set nodeObj.ObjectMeta.Annotations = nil, and clear out the nodeObj.spec / status? What are we going to use to get the machineType of the node?

Also , is it intentional we replace the nodeObj.ObjectMeta.Labels to only be the GkeMetaDataServerKey? Is that the part where we "We are filtering only for relevant Node annotations to optimize memory usage." ? Or where does the filtering happen.

Comment says we do filter for releavant node annotations (its labels), but if GkeMetaDataServerKey isn't there, I tink we

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the point of the function is to fetch only the necessary information we want (i.e. constructing a new Node/Pod object with just the info/fields we need from the actual k8s object, rather than passing the whole object around). This is just my guess looking at the way ConfigurePodLister is currently working :)
When we want to get machineType from the labels, we can do similar processing and add the 'machineType' label ("node.kubernetes.io/instance-type") in newLabels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Sounds good. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an optimization trick we used to reduce informer mem usage

@amacaskill
Copy link
Collaborator

/lgtm
/approve

@amacaskill amacaskill merged commit 9158317 into GoogleCloudPlatform:main Apr 2, 2025
1 check passed
@hungnguyen243
Copy link
Contributor Author

/cherry-pick release 1.13

hungnguyen243 added a commit that referenced this pull request Apr 8, 2025
…533-upstream-release-1.13

Automated cherry pick of #533: Fail NodePublishVolume with actionable error in case Workload Identity is not yet enabled on node level
hungnguyen243 added a commit that referenced this pull request Apr 9, 2025
…tionable error in case Workload Identity is not yet enabled on node level"
hungnguyen243 added a commit that referenced this pull request Apr 9, 2025
…-cherry-pick-of-#533-upstream-release-1.13

Revert "Automated cherry pick of #533: Fail NodePublishVolume with actionable error in case Workload Identity is not yet enabled on node level"
@forking-renovate forking-renovate bot mentioned this pull request Jun 20, 2025
76 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants