Skip to content

traffic-gen: Get pci env var by name #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 14, 2023

Conversation

RamLavi
Copy link
Collaborator

@RamLavi RamLavi commented Mar 13, 2023

On sriov-network-device-plugin a new label was added to the container env vars.
This is causing the checkup to fail the traffic generator pod initiation.
The reason is the currently the trex config script is trying to find the PCI-address env
variable, assuming that there is only one env var that fits the description.
Since this assumption is wrong, changing the script to use the exact
name of the env variable.

@RamLavi RamLavi changed the title Get pci env var by name traffic-gen: Get pci env var by name Mar 13, 2023
@RamLavi RamLavi requested a review from orelmisan March 13, 2023 10:52
@RamLavi RamLavi force-pushed the get_pci_env_var_by_name branch from 09f43e6 to bb8230b Compare March 13, 2023 11:01
Copy link
Member

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @RamLavi.
As we've discussed offline, this implementation will probably not solve BZ 2177668.

The SR-IOV device plugin uses the resourceName when it passes the PCI addresses of the VFs to kubelet as environment variables: https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin#pod-device-information.

We need to use the NetworkAttachmentDefinition's resourceName, instead of its name:

apiVersion: k8s.cni.cncf.io/v1
  kind: NetworkAttachmentDefinition
  metadata:
    annotations:
      k8s.v1.cni.cncf.io/resourceName: openshift.io/intel_nics_dpdk

@RamLavi RamLavi force-pushed the get_pci_env_var_by_name branch 5 times, most recently from 6816925 to 8827f61 Compare March 13, 2023 14:36
Copy link
Member

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes @RamLavi.
Could you please update the PR's description with a short explanation of the problem and the suggested solution?
Could you also please add a link to the BZ?

@RamLavi RamLavi force-pushed the get_pci_env_var_by_name branch from 8827f61 to aadaa65 Compare March 13, 2023 19:26
@RamLavi
Copy link
Collaborator Author

RamLavi commented Mar 13, 2023

CI still fail (but for another reason).
PR change correctly configures /etc/trex_cfg.yaml:

+ cat /etc/trex_cfg.yaml
- port_limit: 2
  version: 2
  interfaces:
    - "0000:19:0a.0"
    - "0000:19:0a.1"
  port_bandwidth_gb: 10
  port_info:
    - ip: 10.10.10.2
      default_gw: 10.10.10.1
    - ip: 10.10.20.2
      default_gw: 10.10.20.1
  platform:
    master_thread_id: 2
    latency_thread_id: 42
    dual_if:
      - socket: 0
        threads: [4,6,8,44,46,48]

@RamLavi
Copy link
Collaborator Author

RamLavi commented Mar 13, 2023

Change: Review fixes
@orelmisan

Comment on lines +440 to +441
if checkupConfig.Verbose {
log.Printf("envVars: %v", envVars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please split this change to a commit of its own?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

RamLavi added 4 commits March 14, 2023 09:57
The env var name is extracted from the net-attach-def annotation with
given recipe [0].
This env var is exported to the traffic generator pod as an env var.

[0]
https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin#pod-device-information

Signed-off-by: Ram Lavi <[email protected]>
Currently the trex config script is trying to find the PCI-address env
variable, assuming that there is only one env var that fits the
description.
Since this assumption is wrong, changing the script to use the exact
name of the env variable.

Signed-off-by: Ram Lavi <[email protected]>
@RamLavi RamLavi force-pushed the get_pci_env_var_by_name branch from aadaa65 to da04b23 Compare March 14, 2023 07:57
@RamLavi
Copy link
Collaborator Author

RamLavi commented Mar 14, 2023

passes on CNV4.12 cluster:

$ TEST_NAMESPACE=ralavi-ns make TEST_IMAGE=quay.io/ramlavi/kubevirt-dpdk-checkup:latest TRAFFIC_GEN_IMAGE_URL=quay.io/ramlavi/kubevirt-dpdk-checkup-traffic-gen:latest test/e2e 
docker run --rm \
           --volume /root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup:/root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup:Z \
           --volume /root/.kube:/root/.kube:Z \
           --workdir /root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup \
           -e KUBECONFIG=/root/.kube/config \
           -e TEST_IMAGE=quay.io/ramlavi/kubevirt-dpdk-checkup:latest \
           -e TEST_NAMESPACE=ralavi-ns \
           -e NETWORK_ATTACHMENT_DEFINITION_NAME= \
           -e RUNTIME_CLASS_NAME= \
           -e TRAFFIC_GEN_IMAGE_URL=quay.io/ramlavi/kubevirt-dpdk-checkup-traffic-gen:latest \
           -e VM_CONTAINER_DISK_IMAGE_URL= \
           docker.io/library/golang:1.19.4 go test ./tests/... -test.v -test.timeout=1h -ginkgo.v -ginkgo.timeout=1h
=== RUN   TestKubevirtDpdkCheckup
Running Suite: KubevirtDpdkCheckup Suite - /root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup/tests
=======================================================================================================
Random Seed: 1678781330

Will run 1 of 1 specs
------------------------------
[BeforeSuite] 
/root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup/tests/test_suite_test.go:63
[BeforeSuite] PASSED [0.004 seconds]
------------------------------
Execute the checkup Job should complete successfully
/root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup/tests/checkup_test.go:79
  [FAILED] in [It] - /root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup/tests/checkup_test.go:89 @ 03/14/23 08:11:53.154
• [FAILED] [183.264 seconds]
Execute the checkup Job [It] should complete successfully
/root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup/tests/checkup_test.go:79

  [FAILED] checkup failed: {
  	"spec.param.networkAttachmentDefinitionName": "intel-dpdk-network-1",
  	"spec.param.testDuration": "1m",
  	"spec.param.trafficGeneratorImage": "quay.io/ramlavi/kubevirt-dpdk-checkup-traffic-gen:latest",
  	"spec.param.trafficGeneratorRuntimeClassName": "performance-performance-zeus10",
  	"spec.param.verbose": "true",
  	"spec.param.vmContainerDiskImage": "",
  	"spec.timeout": "10m",
  	"status.completionTimestamp": "2023-03-14T08:11:47Z",
  	"status.failureReason": "detected Error Packets on the traffic generator's side: Oerrors 0 Ierrors 5067",
  	"status.result.DPDKRxPacketDrops": "86317807",
  	"status.result.DPDKRxTestPackets": "573188359",
  	"status.result.DPDKTxPacketDrops": "0",
  	"status.result.DPDKVMNode": "zeus10.lab.eng.tlv2.redhat.com",
  	"status.result.trafficGeneratorInErrorPackets": "5067",
  	"status.result.trafficGeneratorNode": "zeus10.lab.eng.tlv2.redhat.com",
  	"status.result.trafficGeneratorOutputErrorPackets": "0",
  	"status.result.trafficGeneratorTxPackets": "840000006",
  	"status.startTimestamp": "2023-03-14T08:09:03Z",
  	"status.succeeded": "false"
  }
  In [It] at: /root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup/tests/checkup_test.go:89 @ 03/14/23 08:11:53.154
------------------------------

Summarizing 1 Failure:
  [FAIL] Execute the checkup Job [It] should complete successfully
  /root/go/src/github.com/kiagnose/kubevirt-dpdk-checkup/tests/checkup_test.go:89

Ran 1 of 1 Specs in 183.269 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped
--- FAIL: TestKubevirtDpdkCheckup (183.27s)
FAIL
FAIL	github.com/kiagnose/kubevirt-dpdk-checkup/tests	183.301s
FAIL
make: *** [Makefile:59: test/e2e] Error 1

@RamLavi
Copy link
Collaborator Author

RamLavi commented Mar 14, 2023

on CNV4.13 cluster it fails but the PR does fix what it intened to fix:

+ cat /etc/trex_cfg.yaml
- port_limit: 2
  version: 2
  interfaces:
    - "0000:19:0a.0"
    - "0000:19:0a.3"
  port_bandwidth_gb: 10
  port_info:
    - ip: 10.10.10.2
      default_gw: 10.10.10.1
    - ip: 10.10.20.2
      default_gw: 10.10.20.1
  platform:
    master_thread_id: 2
    latency_thread_id: 42
    dual_if:
      - socket: 0
        threads: [4,6,8,44,46,48]

Copy link
Member

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes @RamLavi.

@RamLavi RamLavi merged commit b28beba into kiagnose:main Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants