-
Notifications
You must be signed in to change notification settings - Fork 384
feat(tests): Add unit tests for DCGM exporter Service and ServiceMonitor #1707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(tests): Add unit tests for DCGM exporter Service and ServiceMonitor #1707
Conversation
…tor reconciliation Signed-off-by: Karthik Vetrivel <[email protected]>
02d09d7
to
bf8398e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karthikvetrivel this is a great start! I made a first pass and left some feedback on the TestServiceMonitor
method.
testCases := []struct { | ||
description string | ||
stateName string | ||
crdPresent bool | ||
dcgmEnabled *bool | ||
nodeStatusEnabled *bool | ||
dcgmSMEnabled *bool | ||
withEdits bool | ||
wantState gpuv1.State | ||
extraAssert assertsFn | ||
}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of separate fields for the various ClusterPolicy options, why not just define one field that contains the ClusterPolicy spec for the test case? For example, what about reducing this to:
testCases := []struct { | |
description string | |
stateName string | |
crdPresent bool | |
dcgmEnabled *bool | |
nodeStatusEnabled *bool | |
dcgmSMEnabled *bool | |
withEdits bool | |
wantState gpuv1.State | |
extraAssert assertsFn | |
}{ | |
testCases := []struct { | |
description string | |
stateName string | |
crdPresent bool | |
cpSpec gpuv1.ClusterPolicySpec | |
wantState gpuv1.State | |
extraAssert assertsFn | |
}{ |
This would eliminate the need for the code you have later on that updates the ClusterPolicy object for each test case.
// Base ClusterPolicy | ||
cp := &gpuv1.ClusterPolicy{Spec: gpuv1.ClusterPolicySpec{}} | ||
// Configure enables | ||
if tc.dcgmEnabled != nil { | ||
cp.Spec.DCGMExporter.Enabled = tc.dcgmEnabled | ||
} | ||
if tc.nodeStatusEnabled != nil { | ||
cp.Spec.NodeStatusExporter.Enabled = tc.nodeStatusEnabled | ||
} | ||
// Configure DCGM SM | ||
if tc.stateName == "state-dcgm-exporter" { | ||
if tc.dcgmSMEnabled != nil { | ||
cp.Spec.DCGMExporter.ServiceMonitor = &gpuv1.DCGMExporterServiceMonitorConfig{Enabled: tc.dcgmSMEnabled} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, I think we could remove the need for this code if we just defined the ClusterPolicy spec for each test case.
// If edits are requested, seed CP config for edits | ||
if tc.withEdits { | ||
cp.Spec.DCGMExporter.ServiceMonitor = &gpuv1.DCGMExporterServiceMonitorConfig{ | ||
Enabled: truePtr, | ||
Interval: promv1.Duration("15s"), | ||
HonorLabels: truePtr, | ||
AdditionalLabels: map[string]string{"a": "b"}, | ||
Relabelings: []*promv1.RelabelConfig{{Action: "keep"}}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, I think we could remove the need for this code if we just embedded this configuration in the test case itself.
// Build the ServiceMonitor resource template | ||
sm := promv1.ServiceMonitor{ | ||
ObjectMeta: metav1.ObjectMeta{Name: "test-sm", Labels: map[string]string{}}, | ||
Spec: promv1.ServiceMonitorSpec{ | ||
NamespaceSelector: promv1.NamespaceSelector{MatchNames: []string{"FILLED BY THE OPERATOR"}}, | ||
Endpoints: []promv1.Endpoint{{}}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit -- since this ServiceMonitor template does not differ between test cases, do we want to define this variable outside of the for loop?
// Build fake client, optionally seed CRD existence by registering type only | ||
b := fake.NewClientBuilder().WithScheme(scheme) | ||
if tc.crdPresent { | ||
crd := &apiextensionsv1.CustomResourceDefinition{ObjectMeta: metav1.ObjectMeta{Name: ServiceMonitorCRDName}} | ||
b = b.WithObjects(crd) | ||
} | ||
k8sClient := b.Build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical, but we could consider adding a client
field for each test case and building the fake client in-line for each test case. For example, for test cases where the ServiceMonitor CRD does not exist we would have:
client: fake.NewClientBuilder().WithScheme(scheme).Build()
while for the test cases where the ServiceMonitor CRD does exist we would have:
client: fake.NewClientBuilder().WithScheme(scheme).WithObjects(serviceMonitorCRD).Build()
extraAssert: func(t *testing.T, c client.Client, name, ns string) { | ||
// Verify object created with edits | ||
found := &promv1.ServiceMonitor{} | ||
err := c.Get(context.TODO(), client.ObjectKey{Namespace: ns, Name: name}, found) | ||
require.NoError(t, err) | ||
require.Equal(t, promv1.Duration("15s"), found.Spec.Endpoints[0].Interval) | ||
require.Equal(t, true, found.Spec.Endpoints[0].HonorLabels) | ||
require.Equal(t, "b", found.Labels["a"]) | ||
require.NotNil(t, found.Spec.Endpoints[0].RelabelConfigs) | ||
require.Equal(t, 1, len(found.Spec.Endpoints[0].RelabelConfigs)) | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of defining custom assert logic here, why don't we define the expected ServiceMonitor object here and compare it to the actual object that gets created when executing the test case?
if tc.extraAssert != nil { | ||
tc.extraAssert(t, k8sClient, "test-sm", "test-ns") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested in a prior comment, for test cases where we expect a ServiceMonitor object to be created, would it be possible to just compare the two service monitor objects (expected vs actual)?
dcgmSMEnabled: truePtr, | ||
withEdits: true, | ||
wantState: gpuv1.Ready, | ||
extraAssert: func(t *testing.T, c client.Client, name, ns string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of defining a closure to define additional assert blocks, can we just define a separate test case to assert the extra fields? That would make it easier to read.
Personally speaking, I'd avoid using closures and just sticking to more procedural coding. It makes test cases easier to read
wantState: gpuv1.Ready, | ||
extraAssert: func(t *testing.T, c client.Client, name, ns string) { | ||
found := &corev1.Service{} | ||
err := c.Get(context.TODO(), client.ObjectKey{Namespace: ns, Name: name}, found) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use context.TODO()
when we intend to pass in a ctx object later on. Since this most likely isn't a TODO, I would suggest using context.Background()
instead
This PR adds unit tests for the ServiceMonitor and Service controller functions, which Prometheus ServiceMonitor custom resources and Kubernetes Service resources for GPU telemetry components.
Changes
Test Coverage
Test coverage bump from 18.7% to 19.7%