Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDN-4919,OCPBUGS-38653,OCPBUGS-38267,OCPBUGS-38693: Downstream Merge 14th August 2024 #2265

Conversation

tssurya
Copy link
Contributor

@tssurya tssurya commented Aug 19, 2024

depends on #2259

arkadeepsen and others added 30 commits August 1, 2024 18:58
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.9+incompatible to 25.0.6+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v24.0.9...v25.0.6)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
The issue was a race where the hybrid overlay node was being updated to
remove the windows label for testing. However, the update action itself
was with a blank original copy of the node which would overwrite l3
gateway config and other OVNK annotations with empty values, causing a
bunch of errors.

This changes the code to just patch and remove the labels, in order to
not corrupt any of the other aspects of the node object itself.

Fixes: #4387

Signed-off-by: Tim Rozet <[email protected]>
Reduce linter time by mounting the golangci-lint local cache.

Signed-off-by: Or Mergi <[email protected]>
Add support for DNSNameResolver in KIND cluster
This PR does following:

- Removes following linux resources if masquerade subnet gets changed
  (node side):

  * Removes old V4HostMasqueradeIP and V6HostMasqueradeIP from bridge.

  * Removes stale neighbour entries V4OVNMasqueradeIP, V6OVNMasqueradeIP,
V4DummyNextHopMasqueradeIP and V6DummyNextHopMasqueradeIP if exists.

  * Removes stale masquerade route added by addMasqueradeRoute() function
while starting up the gateway.

  * Removes stale iptables rules created for masquerade subnet based
on ipForwarding and Gateway mode.

- Removes following linux resources if masquerade subnet gets changed
  (ovnkube-controller to NBDB side):

  * Removes logical router static route used by gateway router and
referencing old masquerade subnet.

  * Removes static mac binding for gateway router's rtoe logical port
referencing old masquerade subnet.

Note, the node now sets an annotation to indicate its masquerade subnet
that it last configured. The node uses this at start up to determine if
there has been a change and cleanup is needed.

On the ovnkube-controller side, it also uses this annotation to
determine if the node has changed. However, it may be racy to rely on
this as the node thread may have already updated the annotation by the
time the ovnkube-controller side handles the cleanup. Therefore, in
addition to the annotation ovnkube-controller will additionally scan for
stale routes in NBDB and then derive the route and mac binding to remove
that way. In order to facilitate this, the masquerade route now has an
external_id present (same as the key used in the annotation) to
distinguish which routes are masquerade routes.

Failure to delete things is not usually an overall failure for OVNK.
Therefore upon failing to clean something up, the error is logged, but
startup continues.

Finally, kind.sh is updated to use a larger masquerade subnet by
default. OVN-Kubernetes defaults themselves remain unchanged. Helm
has also been updated to use a larger subnet.

Co-authored-by: Tim Rozet <[email protected]>
Signed-off-by: Arnab Ghosh <[email protected]>
Support masquerade subnet to be custom configurable as a day2 operation
fix: link to kind usage docs in contrib README.md
Signed-off-by: Martin Kennelly <[email protected]>
Add test cases to cover this scenario.

Signed-off-by: Martin Kennelly <[email protected]>
In order to do that, factor out the code that allocates join IPs and
creates the join switch and move it to the base network controller.

A follow up commit will move both the `createJoinSwitch` and
`createClusterRouter` functions away from the default network
controller, to a different struct / pkg so it can be used by the
controllers that require that particular topology (default net and
secondary L3 nets).

Co-authored-by: Miguel Duarte Barroso <[email protected]>
Signed-off-by: Dumitru Ceara <[email protected]>
To introduce minimal changes to the existing unit tests, we keep the
default network as was, just renaming its sync method.

The default network sync method will in turn invoke the common
gatewayManager sync function, which will do all the heavy work.

Co-authored-by: Dumitru Ceara <[email protected]>
Signed-off-by: Miguel Duarte Barroso <[email protected]>
This commit is just code plumbing; the actual gw sync function is not
being invoked, since we still miss the way to gather the required
inputs.

Follow-up commits will add these.

Co-authored-by: Dumitru Ceara <[email protected]>
Signed-off-by: Miguel Duarte Barroso <[email protected]>
It will be useful in the case of multiple networks that support egress.

Signed-off-by: Dumitru Ceara <[email protected]>
This commit builds the GW configuration from multiple sources:
- NAD
- node annotations

The masquerade IPs are generated from the network ID available on the
nodes, which is unique for each network, thus guaranteeing that each
network also has unique masquerade IPs for it.

Co-authored-by: Enrique Llorente <[email protected]>
Co-authored-by: Miguel Duarte Barroso <[email protected]>
Signed-off-by: Dumitru Ceara <[email protected]>
hack/lint.sh: Mount local golangci-lint cache
This commit refactors the existing code, moving the join switch /
cluster router creation away from the default network controller. This
is done because the layer2 controllers have no need for this type of
topology, and were able to to do previously. This will also make it
simpler to unit test the join switch / cluster router creation.

We ensure these logical entities have their respective network name in
the external IDs, so we a network controller can filter entities for the
network it manages.

The provided struct is properly unit tested.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
We need to use the NAD `networkName` attribute, which will trigger this
particular traffic to be sent via a dedicated patch port.

Signed-off-by: Miguel Duarte Barroso <[email protected]>
@openshift-ci-robot
Copy link
Contributor

@tssurya: This pull request references Jira Issue OCPBUGS-38653, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-37056 is in the state ON_QA, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-37056 targets the "4.18.0" version, which is one of the valid target versions: 4.18.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

This pull request references Jira Issue OCPBUGS-38267, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-37439 is in the state ON_QA, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-37439 targets the "4.18.0" version, which is one of the valid target versions: 4.18.0
  • bug has dependents

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

This pull request references Jira Issue OCPBUGS-38693, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-37433 is in the state ON_QA, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-37433 targets the "4.18.0" version, which is one of the valid target versions: 4.18.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@maiqueb
Copy link
Contributor

maiqueb commented Aug 20, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 20, 2024
Copy link
Contributor

openshift-ci bot commented Aug 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maiqueb, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
Copy link
Contributor

openshift-ci bot commented Aug 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maiqueb, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tssurya
Copy link
Contributor Author

tssurya commented Aug 20, 2024

We have https://pr-payload-tests.ci.openshift.org/runs/ci/4219ce90-5eaf-11ef-884f-6a25e6f62e3b-0 and https://pr-payload-tests.ci.openshift.org/runs/ci/4219ce90-5eaf-11ef-884f-6a25e6f62e3b-1 results from payload

they look good except for the stable upgrades:

  1. https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregator-periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade/1825756468448595968
  2. https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregator-periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade/1825756494457475072
  3. https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregator-periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-gcp-ovn-rt-upgrade/1825756601001185280

all 3 are showing errors that say:

Could not fetch backend disruption data for 1825756470193426432
Could not fetch backend disruption data for 1825756477676064768
Could not fetch backend disruption data for 1825756484391145472
Could not fetch backend disruption data for 1825756491940892672
Could not fetch backend disruption data for 1825756481874563072
Could not fetch backend disruption data for 1825756489432698880
Could not fetch backend disruption data for 1825756486903533568
Could not fetch backend disruption data for 1825756475172065280
Could not fetch backend disruption data for 1825756472693231616

I am not sure what that means in conclusive results?
However in presubmits we have passed 4.16 to 4.17 stable upgrades on aws and gcp-ovn-rt so marking this as risk-assessed

@tssurya
Copy link
Contributor Author

tssurya commented Aug 20, 2024

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Aug 20, 2024
@jechen0648
Copy link

/ocpbugs cc-qa

@jechen0648
Copy link

/label qe-approved

@jechen0648
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 20, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 8c1e320 into openshift:release-4.17 Aug 20, 2024
32 of 34 checks passed
@openshift-ci-robot
Copy link
Contributor

@tssurya: Jira Issue OCPBUGS-38653: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38653 has been moved to the MODIFIED state.

Jira Issue OCPBUGS-38267: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38267 has been moved to the MODIFIED state.

Jira Issue OCPBUGS-38693: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38693 has been moved to the MODIFIED state.

In response to this:

depends on #2259

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ovn-kubernetes-base
This PR has been included in build ose-ovn-kubernetes-base-container-v4.17.0-202408201939.p0.g8c1e320.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ovn-kubernetes-microshift
This PR has been included in build ovn-kubernetes-microshift-container-v4.17.0-202408201939.p0.g8c1e320.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-ovn-kubernetes
This PR has been included in build ose-ovn-kubernetes-container-v4.17.0-202408201939.p0.g8c1e320.assembly.stream.el9.
All builds following this will include this PR.

@tssurya tssurya changed the title OCPBUGS-38653,OCPBUGS-38267,OCPBUGS-38693: Downstream Merge 14th August 2024 SDN-4919,OCPBUGS-38653,OCPBUGS-38267,OCPBUGS-38693: Downstream Merge 14th August 2024 Aug 21, 2024
@openshift-ci-robot
Copy link
Contributor

@tssurya: Jira Issue OCPBUGS-38653: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38653 has been moved to the MODIFIED state.

Jira Issue OCPBUGS-38267 is in an unrecognized state (Verified) and will not be moved to the MODIFIED state.

Jira Issue OCPBUGS-38693 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

In response to this:

depends on #2259

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.