-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip setting VLAN ID when VLAN is 0 - breaks on VF trunk policy #291
Comments
I see some work was done in: #133. |
Hi @RefluxMeds thanks for the input yes please if you can explain about more the use-case this will be great! is the trunk with allow vlan configure a driver specific? |
IntroductionHello @SchSeba ! We're trying to solve a specific security issue in our infrastructure - prohibiting Pod workloads and other host processes from provisioning any VLAN ID they want on assigned virtual functions. This issue stems from our network fabric setup, but I cannot get into details here as it would be too revealing. Also this feature would resolve our platform compliance with the company-wide security policy, so I am really really interested in pursuing it. We are using Mellanox ConnectX-6 (Lx/Dx) and ConnectX-5 NIC on all our servers. These NICs provide a really cool feature we would like to use, however this is not supported with the current implementation of Issue descriptionTo describe the issue, I will post a somewhat lengthy explainer on how we got to here. I apologize in advance if it is too long. Let us take we have two physical functions (PF), with 8 virtual functions (VF) per PF:
These VFs can be consumed by Pod workloads using the
In the Deployment spec, I refer to this NetworkAttachmentDefinition to configure the VF and move it into the Pod network namespace:
The Pod gets spawned up with allocated VF, I can assign IP and ping gateway for that VLAN:
From the host side we can see that one VF is really assigned and all is well:
This is a standard way that you would consume a virtual function. However, this configuration would match what Nvidia calls VST or VLAN Switch Tagging. This does not fit our use case, as VFs have their VLANs configurable both by
The other way to consume a VF would be to treat it like a real trunk port and delegate VLAN tagging responsibilities into the container itself, while imposing an administrative trunk policy on the assigned VF. This configuration would match what Nvidia calls VGT or VLAN Guest Tagging. To configure this, we recreate the NetworkAttachmentDefinition without the VLAN ID and restart the Pod. But now we have to create a VLAN tag on the VF through the Pod in order to ping the gateway:
This is where VGT+ feature comes in. VGT+ is an advanced mode of virtual guest tagging, in which a VF is allowed to tag its own packets, but is still subject to an administrative VLAN trunk policy. To configure the VF trunk policy on both PFs we run a command that configures it:
The guest tagged traffic with VLAN ID
We will add an additional allowed VLAN on virtual function trunks and we will configure the same VF with another VLAN tag which is already allowed, to demonstrate that the policy indeed works:
However, we have issues with this approach if the VF is already assigned to a Pod. When the 802.1Q trunk policy is changed all tagged interfaces inside the Pod have to be recreated for new trunk policy to apply. This means that we either stop all traffic on that interface and recreate it or we must set the trunk policies in advance (before application deployment). Setting trunk policy before provisioning a VF has undesired consequences:
This is true also for NETLINK commands:
Be mindful that we did not specify the VLAN ID in the Why do we think that skipping VLAN assignment when it is 0 would solve the issue?We tested this scenario manually by imposing an administrative trunk policy on all VFs, then we moved the VF interface into the Pod network namespace and provisioned a forbidden VLAN and an allowed VLAN. Firstly let us configure the administrative trunk policy on the VFs:
Secondly, we can create a Pod but this time without the SR-IOV
Let us go back on the host and try to find this network namespace.
We found the correct network namespace. Now we can move any VF interface into it; the VF interfaces are in format
Perfect! We snatched an VF interface. Lets configure it as in the previous examples from inside the Pod:
The final test is to perform a ping from these interfaces to confirm that our theory is indeed correct:
To cleanup we delete the VLAN interfaces and return the VF interface into the root network namespace:
Proposed remediation for the issue / solution time!
Please provide guidance on this matter! |
@mlguerrero12 can you take a look on this one and let me know :) |
@SchSeba, I think tt's a valid request. We assign vlan 0 when the vlan parameter is not set. We could (and should) change this. The idea is to not set vlan 0 when the vlan parameter is not set and set vlan 0 when the config explicitly has the vlan parameter equal to 0. For both cases, we will continue having the restriction of not allowing qos or vlan proto when vlan parameter is not set or 0. I'll work on this. |
Fixes k8snetworkplumbingwg#291 Signed-off-by: Marcelo Guerrero <[email protected]>
Fixes k8snetworkplumbingwg#291 Signed-off-by: Marcelo Guerrero <[email protected]>
@RefluxMeds, fix is in #296. Try to test it from your side please. |
Hi @mlguerrero12 , |
Fixes k8snetworkplumbingwg#291 Signed-off-by: Marcelo Guerrero <[email protected]>
Fixes k8snetworkplumbingwg/sriov-cni#291 Signed-off-by: Marcelo Guerrero <[email protected]>
What issue would you like to bring attention to?
When using trunk policy on VF level, the sriov-cni fails allocating the VF to the Pod.
What is the impact of this issue?
Do you have a proposed response or remediation for the issue?
Either skip applying VLAN config when VLAN==0 in https://github.com/k8snetworkplumbingwg/sriov-cni/blob/master/pkg/sriov/sriov.go#L226
or
Add additional configurable field for
sriov-cni
which would skip VLAN config when e.g.VFtrunk=true
If necessary I can post a longer post explaining all details of this.
The text was updated successfully, but these errors were encountered: