Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sdn 1185 new #54

Open
wants to merge 89 commits into
base: master
Choose a base branch
from
Open

Sdn 1185 new #54

wants to merge 89 commits into from

Conversation

cathy-zhou
Copy link
Owner

- What this PR does and why is it needed

- Special notes for reviewers

- How to verify it

- Description for the changelog

andreaskaris and others added 30 commits November 2, 2022 19:27
Improve kubectl_wait_pods:
* use kubectl rollout to wait for specific DaemonSets
* use kubectl wait pods to wait for specific pods
* respect total timeout

Signed-off-by: Andreas Karis <[email protected]>
Apply retry to namespace controller in ovnk node that, upon a namespace update, syncs UDP conntrack entries.

Signed-off-by: Riccardo Ravaioli <[email protected]>
…ange

Command used:
$ mockery --name NodeWatchFactory    --dir ~/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory/ --output ~/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory/mocks/

04 Oct 22 19:09 CEST INF Starting mockery dry-run=false version=v2.14.0
04 Oct 22 19:09 CEST INF Walking dry-run=false version=v2.14.0
04 Oct 22 19:09 CEST INF Generating mock dry-run=false interface=NodeWatchFactory qualified-name=github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory version=v2.14.0

Signed-off-by: Riccardo Ravaioli <[email protected]>
Add the option to deploy IPsec encryption on kind.

Signed-off-by: Andreas Karis <[email protected]>
When changing the hybrid overlay to dynamically allocated IP addresses
there where instanses where the .3 address was blindly returned. Correct
that mistake

Signed-off-by: Jacob Tanenbaum <[email protected]>
correct the hybrid overlay port MAC address
This PR reverts the commits 82094e2 and df3c905.
This will pave the way for using 'options:hairpin_snat_ip'
feature of OVN loadbalancers.

Signed-off-by: Surya Seetharaman <[email protected]>
Co-authored-by: Andrew Stoycos <[email protected]>
This commit adds a new masqueradeIP which can be
used to tell OVN to explicitly SNAT towards that
masqueradeIP for hairpin service traffic.

Signed-off-by: Surya Seetharaman <[email protected]>
Co-authored-by: Andrew Stoycos <[email protected]>
This commit adds the hairpin SNAT masqueradeIP to
all the load balancers via the 'options:hairpin_snat_ip'
field in OVN load balancers. This will help us in
handling the NP service hairpin traffic correctly.

Example:

_uuid               : c64e4c7b-66f7-4739-a633-06791efe835a
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="surya5/hello-world-net-pol"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_surya5/hello-world-net-pol_TCP_cluster"
options             : {event="false", hairpin_snat_ip="169.254.169.5 fd69::5", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"10.96.1.35:80"="10.244.1.3:8080,10.244.2.3:8080", "[fd00:10:96::a6a6]:80"="[fd00:10:244:2::3]:8080,[fd00:10:244:3::3]:8080"}

NOTE: This might not be needed in the GR LBs, but no harm
in adding it to all, so that we continue to leverage LBGs.

An e2e test is added that demonstrates that the hairpin traffic
will have the fixed masqueradeIP as the srcIP in its response packet.
This commit also consolidates pokeEndpointHostName and pokeEndpointClientIP
into a single function since they were both doing more or less the
same logic.

Signed-off-by: Surya Seetharaman <[email protected]>
Co-authored-by: Andrew Stoycos <[email protected]>
This PR adds the logic to allow or deny traffic correctly
in case of podA->svc->podA traffic. If allow-from-podA
is specified then the hairpin traffic should be allowed
and if podA is not in the whitelist then we shouldn't be
allowing hairpin traffic to come back in.

Logic is to add an extra match to all existing ingress
ACLs that match on namespace or pod selectors. This will
look like this:

_uuid               : 6f2566ce-9236-4bee-896c-be56a91aba16
action              : allow-related
direction           : to-lport
external_ids        : {Ingress_num="0", ipblock_cidr="false", l4Match=None, namespace=surya5, policy=allow-ingress-to-foo4-from-foo2, policy_type=Ingress}
label               : 0
log                 : false
match               : "((ip4.src == {$a6612447164629805900} || (ip4.src == 169.254.169.5 && ip4.dst == {$a6612447164629805900})) || (ip6.src == {$a6612449363653062322} || (ip6.src == fd69::5 && ip6.dst == {$a6612449363653062322}))) && outport == @a14627396333488653719"
meter               : acl-logging
name                : surya5_allow-ingress-to-foo4-from-foo2_0
options             : {}
priority            : 1001
severity            : []

The match here tries to check if:
1) traffic is from the allowed-list-address-set OR
2) traffic is hairpin traffic destined towards allowed-list-address-set
for both v4 and v6 cases.

second point is what enables service hairpinning to behave correctly
since we know destination will be same as source for hairpin.

Signed-off-by: Surya Seetharaman <[email protected]>
Co-authored-by: Andrew Stoycos <[email protected]>
Co-authored-by: Nadia Pinaeva <[email protected]>
OCPBUGS-2868: Ignore addresses in masquerade subnet when retrieving gateway IPs
Don't log in iterateRetryResources when there are no retry entries
When nextQueryTime for dns entry is already expired and update goroutine is trying
to use negative duration to reset ticker. This causes process crash on the ovnk
master. This change avoids proces crash and ensures to trigger ticker in shortest
possible time (1 ms) so that update happens immediately on expired dns entry.

Signed-off-by: Periyasamy Palanisamy <[email protected]>
…l-dns-update-panic

Handle expired entry while handling dns update
When there are multiple addresses on the primary interface for a node,
conntrack uses the first one by default. During upgrade that ends
up being the masquerade address because it was added last. This
causes connection problems because the masquerade address ends up in
a dadfailed state because we use the same address on all nodes. Since
we don't care about DAD on that address, let's disable it so the
address doesn't get into a bad state.

Signed-off-by: Ben Nemec <[email protected]>
…airpin

Fix Network Policy Service Hairpin Traffic
…pods

Hold lock when deleting completed pod during update event
…ases

If real conntrack operations are performed against the CI host we may
get operation errors because we're not running a real node in the testcases
and we fake out all the iptables stuff too. We shouldn't mix real operations
in the CI host's netns with fake operations in the testscases because then we
cannot control the test parameters.

Second, don't start the node IP handler's netlink watch or sync host IPs
in testcases because it scrapes the CI host's IP addresses which service
deletion uses when clearing conntrack, which the testcases know nothing
about. Just stick with the fake node IP that the testcases already add.

This all fixes errors like:

[91m[1m• Failure [0.114 seconds][0m
Node Operations
[90m/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:161[0m
  on delete
  [90m/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:1211[0m
    [91m[1mdeletes iptables rules with ExternalIP [It][0m
    [90m/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:1212[0m

    [91mUnexpected error:
        <*errors.errorString | 0xc000de8cb0>: {
            s: "DeleteService failed for nodePortWatcher: failed to delete conntrack entry for service namespace1/service1: failed to delete conntrack entry for service namespace1/service1 with svcVIP 1.1.1.1, svcPort 8032, protocol TCP: operation not permitted",
        }
        DeleteService failed for nodePortWatcher: failed to delete conntrack entry for service namespace1/service1: failed to delete conntrack entry for service namespace1/service1 with svcVIP 1.1.1.1, svcPort 8032, protocol TCP: operation not permitted
    occurred[0m

    /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:1243
[90m------------------------------[0m
[0mNode Operations[0m [90mon delete[0m
  [1mdeletes iptables rules for NodePort[0m
  [37m/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:1295[0m

Signed-off-by: Dan Williams <[email protected]>
We need to wait until our client-side shared informer (and thus watch
factory) has seen the annotation update before we can delete the pod,
otherwise the object retry code may not get the update.

Signed-off-by: Dan Williams <[email protected]>
node: mock conntrack delete operations for service and EgressIP testcases
Fixes a scenario where a pod exists but the node does not and upon ovnk
master reboot our sync would fail. During sync we should attempt to
preallocate all existing pod IPs, which include terminating pods with
finalizers. Pods in terminating state should still retain networking
access.

Signed-off-by: Tim Rozet <[email protected]>
kind.sh: Add IPsec option to kind deployments
Handle cases where a pod may exist but the node does not
martinkennelly and others added 29 commits December 2, 2022 10:40
Dockerfile.fedora: bump OVN to 22.09.0-22
…swithnolsatstartup

Skip cleaning LSPs when LS doesn't exist
* Add help text for ipsec option to kind.sh
* Fix OpenSSL version check
* Run certificate signer in foreground
* In ovn-ipsec.yaml.j2 make the counter increment the same as in
  downstream cluster-network-operator

Signed-off-by: Andreas Karis <[email protected]>
…22-12-02

kind: Fix ipsec in kind.sh and fix counter expression
There can be multiple management ports. Currently in the case of Full
mode with hardware backed mgmt port, the virtual function representor
and virtual function netdev exists separately. However the management
port config belongs to the virtual function representor in this case.
This caused the Egress IP health check to fail to initialize since
there won't be an IP address on the virtual function representor
interface. Instead the code now iterates through all management port
interfaces, skips the virtual function representor and finds the virtual
function netdev management port with a valid IP address configuration.

In the case of DPU, there will be a DPU-Host mode node and DPU mode
node. Also it will always use hardware backed mgmt ports. The DPU mode
node will only have a virtual function representor, thus the Egress IP
Health Check won't run on this node. The DPU-Host mode node will only
have a virtual function netdev; thus the Egress IP Health Check will
run on this node.

Reported-at: https://issues.redhat.com/browse/OCPBUGS-4098
Signed-off-by: William Zhao <[email protected]>
Mainly to pickup the fix for the session affinity timeout bug.

Signed-off-by: Numan Siddique <[email protected]>
Existing OVN code can be shared by the default network controller and
the upcoming secondary layer 3 network controller. Update it to get
ready for the seconary layer 3 network support.

Signed-off-by: Yun Zhou <[email protected]>
Dockerfile.fedora: bump OVN to 22.09.0-25
This PR adds support for session affinity
timeout which is required for OVNK plugin to
become CNCF complaint and pass the conformance
tests around this feature.

NOTE: Most of the work has been done in OVN,
only bit OVNK does is to enable this in the
right manner as an option on the load balancer
if a service with session affinity is created.
OVN version bump is being done in the fedora
image used to the build.

NOTE2: We used to use the selectionFields option
before, this is no longer needed because there can
never be a session affinity without a timeout set
(defaults to 3hrs), so if affinity-timeout is set
then OVN does the rest of the logic to determin
stickyness.

NOTE3: Is the stickness timeout calculated for
packets or sessions? (can't remember what IPVS
does and OVN does) :( so do we timeout after
the last packet ?

Signed-off-by: Surya Seetharaman <[email protected]>
…ty-timeout

Support LB Session Affinity TimeOut
make code shareable for secondary network controller
Fix FakeAddressSetFactory.ProcessEachAddressSet locking to allow calling
other FakeAddressSetFactory in iteratorFn.

Signed-off-by: Nadia Pinaeva <[email protected]>
currently there is a race condition that during startup a pod can
be assigned an address that was previously assigned to a nodes hybrid
overlay distributed router ip.

between syncPods() completeing and the nodes AddNode() event retrying an
incoming addPod() can be allocated the address that was assigned to the
node for hybrid overlay

fix that by assigning the existing hybrid overlayDRIP addresses during
syncPods(). Since we do not support a user manually editing the
hybridOverlay DRIP address or podIPs that any IP address once set as the
hybridOverlay DRIP address will always be valid

Signed-off-by: Jacob Tanenbaum <[email protected]>
When pod updates happen, handlers may be invoked that add pods IPs to
address sets. A prime example of this is network policy handlePeerPods.
Each time the pod is updated, the address will be added to the address
set with a mutate-insert operation. The cost of this txn when there are
many network policies that apply to this pod is high. For example, with
a pod that has 730 network policies, and 7 pod updates that would result
in 5110 transactions instead of only the 730 needed to add this pod in
the first place.

Signed-off-by: Tim Rozet <[email protected]>
Checks the cache in order to decide whether or not we actually need to
send a transaction to add the port. This results in a perf gain of not
wasting OVSDB transactions.

Signed-off-by: Tim Rozet <[email protected]>
Eliminates duplicate transactions for adding to address sets and port groups
to avoid ovn-controller syntax error.

Signed-off-by: Nadia Pinaeva <[email protected]>
LinkRoutesAddOrUpdateSourceOrMTU/LinkRoutesApply: update route GW if it was changed
selector to reduce the number of per-namespace pod handlers.
We don't need per-namespace handlers in case of empty namespace
selector, just use pod selector with empty namespace.

Signed-off-by: Nadia Pinaeva <[email protected]>
…found-fast-fix

Fix address set cleanup: only delete address sets owned by given object.
…selector

Use PeerPodHandler if namespaceAndPod selector has empty namespace
…ric_desc_pod_creation

Metrics: update pod creation latency metric description
…ems.

Commit 09af653 moved the LRP creation from ensureNodeLogicalNetwork to
syncNodeClusterRouterPort, so that when creating LRP the gateway-chassis
can be set, to pin LS to a chassis. However, Commit c58554a
accidently reverted the change when converting nbctl to libovsdb (it is
likely a rebase problem). It creates LRP in ensureNodeLogicalNetwork
without setting gateway-chassis, which end up with all existing
ovn-controllers change monitor condition to monitor the new LS datapath.
And later, when ovnkube-node annotates the chassis-id, ovnkube-master
calls syncNodeClusterRouterPort, which updates the LRP and sets the
gateway-chassis, so all the ovn-controllers (except the new node) change
monitor condition again to remove the LS.

Such monitor condition changes come from every existing node, so the SB
DB become extremely busy refiltering all the existing lflows/PBs for all
the monitor conditions - O(M x N) problem. Even if condtional-monitoring
is disabled, this would cause unnecessary churns for all the
ovn-controllers.

This patch fixes the problem by:
1. Remove the LRP creation from the ensureNodeLogicalNetwork.
2. Combining the LRP creation and gateway-chassis setting into the same
   transaction. For this purpose the related APIs in router.go are
   refactored/simplified.

Signed-off-by: Han Zhou <[email protected]>
…-master.v2

fix hybrid overlay allocation on the master
…le-creating-lrp

Set gateway-chassis when LRP is being created, to avoid scaling problems
This reverts commit dabaf9f.

Original PR never passed control plane tests.

Signed-off-by: Tim Rozet <[email protected]>
Revert "fix hybridOverlay DRIP address allocation"
Today, port group is created for per network policy, even if they share
the same local pod selector. This means for the same pod that is
selected, its logical port will be added/deleted for multiple port
groups.

This commit is to add support of sharing of port_groups for multiple
network policies if they have the same local pod selector.

Signed-off-by: Yun Zhou <[email protected]>
@coveralls
Copy link

Pull Request Test Coverage Report for Build 3698092772

  • 2308 of 3758 (61.42%) changed or added relevant lines in 55 files are covered.
  • 52 unchanged lines in 16 files lost coverage.
  • Overall coverage increased (+0.7%) to 54.006%

Changes Missing Coverage Covered Lines Changed/Added Lines %
go-controller/pkg/ovn/gateway_init.go 19 20 95.0%
go-controller/pkg/util/node_annotations.go 9 10 90.0%
go-controller/pkg/ovn/address_set/address_set.go 14 16 87.5%
go-controller/pkg/ovn/egressfirewall_dns.go 4 6 66.67%
go-controller/pkg/ovn/gateway_cleanup.go 5 7 71.43%
go-controller/pkg/ovn/namespace.go 38 40 95.0%
go-controller/pkg/ovn/ovn.go 34 36 94.44%
go-controller/pkg/util/iptables.go 0 2 0.0%
go-controller/pkg/kube/healthcheck/healthcheck.go 4 7 57.14%
go-controller/pkg/node/management-port.go 0 3 0.0%
Files with Coverage Reduction New Missed Lines %
go-controller/pkg/node/healthcheck.go 1 39.47%
go-controller/pkg/ovn/egressfirewall_dns.go 1 79.71%
go-controller/pkg/ovn/loadbalancer/loadbalancer.go 1 0%
go-controller/hybrid-overlay/pkg/controller/node_linux.go 2 60.08%
go-controller/pkg/node/gateway.go 2 43.23%
go-controller/pkg/ovn/egressip.go 2 63.05%
go-controller/pkg/ovn/logical_switch_manager/logical_switch_manager.go 2 37.26%
go-controller/pkg/node/gateway_iptables.go 3 86.15%
go-controller/pkg/node/port_claim.go 3 55.41%
go-controller/pkg/ovn/pods.go 3 76.61%
Totals Coverage Status
Change from base Build 3433309980: 0.7%
Covered Lines: 16973
Relevant Lines: 31428

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.