Skip to content

Conversation

mzazrivec
Copy link
Contributor

@mzazrivec mzazrivec commented Apr 9, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This pull request implements CRD and a controller for provisioning complete networking infrastructure required to install a ROSA-HCP cluster in AWS. The proposal for this implementation has been described in #5381.

Under the hood, the implementation uses cloudformation stack and a static (i.e. no possibility of customization) cloudformation template from rosa-cli

This pull request depends on openshift/rosa#2904 (now merged).

Quick howto:

$ export ROSA_NETWORK_NAME=rosa-net-01
$ export AWS_REGION=us-west-2
$ export AVAILABILITY_ZONE_COUNT=2
$ export CIDR_BLOCK=10.0.0.0/16
$ clusterctl generate yaml --from templates/rosa-network.yaml > rosa-net-01.yaml
$ kubectl apply -f rosa-net-01.yaml

To use the ROSANetwork from ROSA control plane:

apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: ROSAControlPlane
metadata:
  name: rosa-hcp01-control-plane
  namespace: default
spec:
  rosaNetworkRef:
    name: rosa-net01

and skip / remove subnets and availability zones from the CP spec.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

New API for provisioning network infrastructure for ROSA clusters

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority labels Apr 9, 2025
@k8s-ci-robot k8s-ci-robot requested review from faiq and serngawy April 9, 2025 19:27
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andidog for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 9, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @mzazrivec. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

webhookClientConfig:
# this is "\n" used as a placeholder, otherwise it will be rejected by the apiserver for being blank,
# but we're going to set it later using the cert-manager (or potentially a patch if not using cert-manager)
caBundle: Cg==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add the caBundle.

Resource string `json:"resource"`

// Identified of the created resource. Will be filled in once the resource is created & ready
ID string `json:"ID"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ID string `json:"ID"`
Id string `json:"id"`

Or resourceId

// CFResource groups information pertaining to a resource created as a part of a cloudformation stack
type CFResource struct {
// Name of the created resource: NATGateway1, VPC, SecurityGroup, ...
Resource string `json:"resource"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Resource string `json:"resource"`
Name string `json:"name"`

OR resourceName

Status string `json:"status"`

// Message pertaining to the status of the resource
Reason string `json:"reason"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message is better I guess ?

Suggested change
Reason string `json:"reason"`
Message string `json:"message"`

// Availability zone of the subnet pair
AvailabilityZone string `json:"availabilityZone"`

// ID of the public subnet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ID of the public subnet
// Public subnet Id ex; subnet-xxxxxxxxxx

main.go Outdated
@@ -284,6 +284,15 @@ func main() {
}
}

// TODO: feature gates?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a new feature gate, we can have it under ROSA feature gate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I did not mean a new feature gate here, just the existing rosa FG.

@serngawy
Copy link
Contributor

you also need to update the ValidatingWebhookConfiguration and MutatingWebhookConfiguration here

@mzazrivec mzazrivec force-pushed the rosa_network branch 4 times, most recently from 5907fb1 to 24a5950 Compare April 24, 2025 13:20
@mzazrivec mzazrivec force-pushed the rosa_network branch 3 times, most recently from a947563 to a255790 Compare May 19, 2025 13:43
Copy link
Contributor

@serngawy serngawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

// If no identity is specified, the default identity for this controller will be used.
//
// +optional
IdentityRef *infrav1.AWSIdentityReference `json:"identityRef,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, if we want to provide this option to end user. We don't do that with RosaControlPlane only default aws identity. However, we should provide OCM identityRef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't we provide this option to the end user? We need to specify the ref to the aws secret somehow. Here I'm just reusing existing structures & code.

What do you mean by OCM identity ref? OCM will not be involved here in any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, to use openshift/rosa and establish ocm client you need to have ocm authentication. Is this not the case with the RosaNetwork CF stack creation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. No OCM credentials are needed for rosanet, just AWS credentials.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serngawy Are you satisfied with the answers here?

Copy link
Contributor

@serngawy serngawy Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzazrivec I do remember we discuss that, but after checking the ROSANetwork cloud formation stack template , there are tags added as rosa_hcp_policy and roas service here.
Those tags I think is used to check for privileges ?
I think we have to authenticate the ocm credential. Even if we don't need to create the CF stack but enduser must be a valid OCM user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serngawy What does creating VPC with certain tags and checking OCM credentials have to do with each other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, as we discussed no need to have ocm authentication.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 28, 2025
@mzazrivec mzazrivec force-pushed the rosa_network branch 4 times, most recently from d2534a7 to dcc599d Compare June 9, 2025 08:19
@mzazrivec mzazrivec changed the title ✨ RosaNetwork: new CRD & reconciler to provision network infrastructure for ROSA-HCP ✨ ROSANetwork: new CRD & reconciler to provision network infrastructure for ROSA-HCP Aug 14, 2025
@serngawy
Copy link
Contributor

please add release note, stating add new ROSA network API

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 14, 2025
@mzazrivec
Copy link
Contributor Author

please add release note, stating add new ROSA network API

Added.


const (
// ROSANetworkReadyCondition condition reports on the successful reconciliation of ROSANetwork.
ROSANetworkReadyCondition clusterv1.ConditionType = "ROSANetworkReady"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a blocker because we'll need to revisit this across the entire project, but conditions will be changing upstream and we'll need to revisit these to make sure they're conforming to the accepted proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you mentioned in my proposal PR that the Reason field needs to be set also in the case of success, which is what this PR does.

Copy link
Contributor

@nrb nrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions for idiomatic Go style that shouldn't affect the functionality of the PR.

Also have some questions about a few of the interface design decisions.

Once those are answered and we have some baseline testing for the ROSANetwork controller, I think we'll be in good shape.

}

// ROSANetworkFinalizer allows the controller to clean up resources on delete.
const ROSANetworkFinalizer = "rosanetwork.infrastructure.cluster.x-k8s.io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most other packages in the repo put these finalizer definitions in the corresponding api/<version>/<name>_types.go files, but I see there's other ROSA files that put their finalizers in the controller package.

Ideally we'd clean this up, but it's not a blocker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, moving this to the API is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrb yep, moved.

if apierrors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{Requeue: true}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do nothing with the error at all? Should we perhaps log it, so the user can at least see that errors are causing reconcile loops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you asked, because I asked this very question within my team last week.

Short answer to your original question would be: this same pattern is in several places in CAPA.

Of course, I can add the error logging there. But probably same thing should have to be done in

controlplane/rosa/controllers/rosacontrolplane_controller.go
exp/controllers/awsfargatepool_controller.go
exp/controllers/awsmanagedmachinepool_controller.go
exp/controllers/rosamachinepool_controller.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzazrivec just return the error here , other controllers can be done in another PR.

Comment on lines 144 to 143
for i := 1; i <= len(rosaNetScope.ROSANetwork.Spec.AvailabilityZones); i++ {
cfParams[fmt.Sprintf("AZ%d", i)] = rosaNetScope.ROSANetwork.Spec.AvailabilityZones[i-1]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i := 1; i <= len(rosaNetScope.ROSANetwork.Spec.AvailabilityZones); i++ {
cfParams[fmt.Sprintf("AZ%d", i)] = rosaNetScope.ROSANetwork.Spec.AvailabilityZones[i-1]
}
for i, zone := range rosaNetScope.ROSANetwork.Spec.AvailabilityZones) {
cfParams[fmt.Sprintf("AZ%d", i)] = zone
}

is more Go-native.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrb yep, changed

for i := 1; i <= len(rosaNetScope.ROSANetwork.Spec.AvailabilityZones); i++ {
cfParams[fmt.Sprintf("AZ%d", i)] = rosaNetScope.ROSANetwork.Spec.AvailabilityZones[i-1]
}
cfTags := map[string]string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably have some tags identifying the resources in AWS, in case an administrator is looking at the AWS console and wondering where the CF resources came from.

You can look into how we do it for clusterawsadm, and find a Tag type in the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I understand what you mean here.

The particular stack resources (VPC, subnets, ...) are already being tagged as is defined in the CF template from rosa-cli.

If you're talking about tagging the CF stack object itself, then I don't see clusterawsadm bootstrap creating those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expose tags field to ROSANetworkSpec so administrator (endUser) can add tags to the created VPC. Its a common use of tags with aws resources.

}
}

if resource.LogicalID[0:13] == "SubnetPrivate" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

Comment on lines +301 to +302
i := 0
for _, v := range subnets {
rosaNet.Status.Subnets[i] = *v
i++
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i := 0
for _, v := range subnets {
rosaNet.Status.Subnets[i] = *v
i++
}
rosaNet.Status.Subnets = make([]expinfrav1.ROSANetworkSubnet)
rosaNet.Status.Subnets = append(rosaNet.Status.Subnets, subnets...)

will copy the entries into a new slice. Note that if you specify a length with the make call when doing it this way, you'll see double the number of entries that you'd expect.

It also doesn't look like we're doing anything with i here. If you want the index produced by range, you can do for i, v := range subnets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. rosaNet.Status.Subnets should in the end be an array of objects, looking like this:

kind: ROSANetwork
metadata:
  name: rosa-network-01
  namespace: default
status:
  subnets:
  - availabilityZone: us-west-2a
    privateSubnet: subnet-1d9f28ba992a83514
    publicSubnet: subnet-0d9f28ba991b93514
  - availabilityZone: us-west-2b
    privateSubnet: subnet-2d7f58c09f1b43512
    publicSubnet: subnet-2d7f18c09f1b43512
  - availabilityZone: us-west-2c
    privateSubnet: subnet-7d7e19c0af1f4d57f
    publicSubnet: subnet-1d7e19c0af1c4c57f

In my code subnets is a map (subnets := make(map[string]*expinfrav1.ROSANetworkSubnet)), so len(subnets) should return the number of keys, which is what I want. I certainly don't want to append subnets to rosaNet.Status.Subnets, just its values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzazrivec better to use native function to get the map values instead of doing for loop , check here

}

// Close closes the current scope persisting the rosanetwork configuration and status.
func (s *ROSANetworkScope) Close() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this extra function if it only calls PatchObject and returns any error it gets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Again, something that all the other scope objects do. I can change it of course, but this is a pattern and if it's not desirable, it should be probably changed everywhere.

@@ -104,4 +104,6 @@ type SessionMetadata interface {
InfraCluster() ClusterObject
// IdentityRef returns the AWS infrastructure cluster identityRef.
IdentityRef() *infrav1.AWSIdentityReference
// ControllerName returns the controller name
ControllerName() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the controller being exposed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need the controller name when constructing key to session cache in pkg/cloud/scope/session.go (see my change in getSessionName() there). Without that change we could hit conflicts in the cache, should for example both the rosanet and rosacp be named the same and be placed in the same namespace.

g.Expect(rosaNetScope.Session()).ToNot(BeNil())

// AWSClusterStaticIdentity
rosaNetwork.Spec.IdentityRef.Name = "cluster-static-identity"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will other identity types be supported in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, all the current identities should work with rosanet.

limitations under the License.
*/

package controllers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some basic tests for the controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added basic exp/controllers/rosanetwork_controller_test.go, which I want to keep on extending.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 21, 2025
@mzazrivec mzazrivec force-pushed the rosa_network branch 12 times, most recently from 6a884a4 to e646926 Compare August 28, 2025 15:18
@k8s-ci-robot
Copy link
Contributor

@mzazrivec: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-apidiff-main 2c7bf06 link false /test pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-verify 2c7bf06 link true /test pull-cluster-api-provider-aws-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants