-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Devlink Error message #413
Devlink Error message #413
Conversation
I can see the golangci-lint due to "context deadline exceeded". This seems to be occurring on other PR's also. Will discuss in meeting. |
@adrianchiris Could you review? |
please note that we embed (sort of) the netlink error with |
@@ -67,7 +67,7 @@ func GetPfName(pciAddr string) (string, error) { | |||
if pfEswitchMode == "" { | |||
// If device doesn't support eswitch mode query or doesn't have sriov enabled, | |||
// fall back to the default implementation | |||
if err == nil || strings.Contains(strings.ToLower(fmt.Sprint(err)), "no such device") { | |||
if err == nil || strings.Contains(strings.ToLower(fmt.Sprint(err)), "error getting devlink device attributes for net device") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we are intentionally matching on the internal error returned by netlink.
@Eoghan1232 are you hitting an issue with intel NICs not returning "no such device" error when trying to get devlink device attributes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianchiris "no such device" maybe gets returned in real environment but in a test environment with mock sysfs you won't necessarily get this exact error.
For example, in some external code that uses this function and have unit test with mock sysfs they might get "no such file or directory" instead. In that case, it breaks the non-devlink detection logic here.
Since GetPfEswitchMode()
returns error string from GetDevLinkDeviceEswitchAttrs()
it's safe to rely on that error message only if the underlying netlink fails to find such device in a system - real or mock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianchiris "no such device" maybe gets returned in real environment but in a test environment with mock sysfs you won't necessarily get this exact error.
if there is a mock sysfs ,cant you also mock GetDevLinkDeviceEswitchAttrs
?
we do not want to fallback to the regular flow if netlink failed for reason other than
- device doesnt support eswitch mode query
- device doesnt have sriov enabled
since in this case, if device is in switchdev mode but failed for a reason other than the above, we would want to fail else we risk returning the wrong PF name.
I can think of two things:
- adjust tests (netlink provider is an interface)
- change this to a list of expected strings and check if one is a substring of the returned error
do you have in house e2e tests that just use mock sysfs ? there are other calls to netlink in general are you not calling them ? (i.e GetLinkAttr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianchiris "no such device" maybe gets returned in real environment but in a test environment with mock sysfs you won't necessarily get this exact error.
if there is a mock sysfs ,cant you also mock
GetDevLinkDeviceEswitchAttrs
?we do not want to fallback to the regular flow if netlink failed for reason other than
- device doesnt support eswitch mode query
- device doesnt have sriov enabled
This is where the problem is. It does not come to the 2nd option at all because the first option return the error as with GetPfEswitchMode(pciAddr)
returned ("", err). So it evaluates error string comparison and it evaluates to false and return GetPfName()
to caller with error. Never get a chance to check sriov enable part. Please review the flow again.
since in this case, if device is in switchdev mode but failed for a reason other than the above, we would want to fail else we risk returning the wrong PF name.
I can think of two things:
- adjust tests (netlink provider is an interface)
- change this to a list of expected strings and check if one is a substring of the returned error
do you have in house e2e tests that just use mock sysfs ? there are other calls to netlink in general are you not calling them ? (i.e GetLinkAttr)
The NetlinkProvider
interface is internal to this package only. External client of this package cannot and should not have to mock any internal interfaces of this package. The utils.GetPfName()
function should be testable with a minimal mock sysfs as it used to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the problem is. It does not come to the 2nd option at all because the first option return the error as with GetPfEswitchMode(pciAddr) returned ("", err). So it evaluates error string comparison and it evaluates to false and return GetPfName() to caller with error. Never get a chance to check sriov enable part. Please review the flow again.
Do you mean the call to GetPfAddr()
fails in GetPfEswitchMode()
and returns error to GetPfName()
?
if GetPfAddr()
fails, it means it failed to read symlink /sys/bus/pci/devices/<pci addr>/physfn
so in this case we would fail in GetPfName()
even if we fallback to the previous flow since we try to read from the same base path. (unless in the mocked environment /sys/bus/pci/devices/<pci addr>/physfn
is not a symlink)
or do you mean, GetPfAddr() call in GetPfEswitchMode()
returns the PF addr of the VF (or its own address in case of PF) however the devlink call which follows this fails as the isnt actually a device ?
can you walk me through the flow to help me understand ? what does GetPfName
gets for input ? what is the mocked sysfs ?
The NetlinkProvider interface is internal to this package only.
actually its public and you can get and set the implementation, defaulting to defaultNetlinkProvider
which implements the NetlinkProvider interface, you can set a mock for it.
function should be testable with a minimal mock sysfs as it used to be.
I do not agree with this statement, i dont want to impose such constraints on implementation.
there are other places where we want to use devlink/netlink or other external libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Eoghan1232 could you rebase the PR ? there are some changes introduced with #410
i think we can simplify the whole flow flow here:
pfEswitchMode, err := GetPfEswitchMode(pciAddr)
if pfEswitchMode == "" || err != nil {
glog.Infof("Devlink query for eswitch mode is not supported for device %s. err: %v", pciAddr, err)
} else if pfEswitchMode == eswitchModeSwitchdev {
name, err := GetSriovnetProvider().GetUplinkRepresentor(pciAddr)
if err != nil {
return "", err
}
return name, nil
}
path := filepath.Join(sysBusPci, pciAddr, "physfn", "net")
which should work with a mocked sysfs test env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR rebased.
the netlink error is embedded within the returned error, what is the full error you are getting. |
OK, was trying to remember why we matched on this specific string and not the one returned by netlink provider.
also we saw some wierd behaviour with netlink, hence the whole "checking for empty eswitch mode": I left another comment on how we can further simplify this, LMK what you think. |
fcd1ec9
to
bdf4e9b
Compare
This commit fixes an IF condition being met incorrectly.
The GetPfName() in utils package returns the wrong error even when device is non-devlink.
line in utils.go
because it compares underlying netlink provider error message to different string than what actual implementation returns.
line in netlink_provider.go
signed off:
[email protected]
thanks to @ahalimx86 for spotting this.