Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor observability controller #1178

Merged
merged 1 commit into from
Mar 14, 2025
Merged

Conversation

eguzki
Copy link
Contributor

@eguzki eguzki commented Feb 27, 2025

What

Following up the work being done in #1164. Refactor to, hopefully, simplify the code to maintain.

  • Monitoring resources loaded in memory are no longer static. They are instantiated every time. Reason being that go client might update objects on creation.
  • Implemented mutator pattern with createOnly mutator, having the same functionality: resources can externally be updated and the operator will not revert the changes back.
  • Added error handling. When resource creation reports error, the error is being returned and propagated up in the stack.
  • istiodMonitor and envoyGatewayMonitor are not created repeatedly for each gateway. They are single instance per Istio control plane / EnvoyGateway control plane.

Verification steps

  • Setup environment based on EnvoyGateway
make local-setup GATEWAYAPI_PROVIDER=envoygateway
  • Apply the Kuadrant custom resource with observability enabled
kubectl -n kuadrant-system apply -f - <<EOF
---
apiVersion: kuadrant.io/v1beta1
kind: Kuadrant
metadata:
  name: kuadrant-sample
spec:
  observability:
    enable: true
EOF
  • Wait for kuadrant instance to be ready
kubectl wait --timeout=300s --for=condition=Ready kuadrant kuadrant-sample -n kuadrant-system
  • Verify the monitoring resources are there
k get servicemonitor,podmonitor -A

The expected result:

NAMESPACE              NAME                                                              AGE
envoy-gateway-system   servicemonitor.monitoring.coreos.com/envoy-gateway-monitor        148m
kuadrant-system        servicemonitor.monitoring.coreos.com/authorino-operator-monitor   148m
kuadrant-system        servicemonitor.monitoring.coreos.com/dns-operator-monitor         148m
kuadrant-system        servicemonitor.monitoring.coreos.com/kuadrant-operator-monitor    148m
kuadrant-system        servicemonitor.monitoring.coreos.com/limitador-operator-monitor   148m

NAMESPACE        NAME                                                   AGE
gateway-system   podmonitor.monitoring.coreos.com/envoy-stats-monitor   148m

Filtering by labels kuadrant-observability=true should give the same output.

k get servicemonitor,podmonitor -l kuadrant.io/observability=true -A
  • Turn monitoring off
kubectl patch kuadrant kuadrant-sample --type=merge --patch '{"spec": {"observability": null}}' -n kuadrant-system
  • Verify the monitoring resources are gone
k get servicemonitor,podmonitor -l kuadrant.io/observability=true -A

The expected result:

No resources found

@eguzki eguzki requested a review from david-martin February 27, 2025 13:29
Copy link

codecov bot commented Feb 27, 2025

Codecov Report

Attention: Patch coverage is 96.38989% with 10 lines in your changes missing coverage. Please review.

Project coverage is 83.91%. Comparing base (2ae754e) to head (2c43a4f).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
controllers/observability_reconciler.go 96.37% 8 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1178      +/-   ##
==========================================
+ Coverage   83.38%   83.91%   +0.53%     
==========================================
  Files          82       82              
  Lines        7120     7388     +268     
==========================================
+ Hits         5937     6200     +263     
- Misses        949      952       +3     
- Partials      234      236       +2     
Flag Coverage Δ
bare-k8s-integration 24.66% <45.12%> (+1.24%) ⬆️
controllers-integration 71.28% <3.97%> (-3.00%) ⬇️
envoygateway-integration 42.53% <63.89%> (+0.89%) ⬆️
gatewayapi-integration 19.21% <3.97%> (-0.85%) ⬇️
istio-integration 45.82% <79.06%> (+1.80%) ⬆️
unit 18.37% <0.00%> (-0.72%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
api/v1beta1 (u) 90.78% <ø> (ø)
api/v1beta2 (u) ∅ <ø> (∅)
pkg/common (u) ∅ <ø> (∅)
pkg/istio (u) 62.06% <ø> (ø)
pkg/log (u) 93.18% <ø> (ø)
pkg/reconcilers (u) 25.16% <100.00%> (+0.49%) ⬆️
pkg/rlptools (u) ∅ <ø> (∅)
controllers (i) 87.11% <96.42%> (+0.58%) ⬆️
Files with missing lines Coverage Δ
controllers/state_of_the_world.go 90.93% <100.00%> (ø)
controllers/observability_reconciler.go 94.75% <96.37%> (+15.65%) ⬆️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

r.createMonitor(ctx, monitorObjs, kOpMonitorSpec, observability.ServiceMonitorsResource, logger)
kOpMonitor := kOpMonitorBuild(r.namespace)
err := r.createServiceMonitor(ctx, kOpMonitor, logger)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors are deliberately only logged, not propagated up the stack, to allow other monitors to possibly be still created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will leave as it is then. However, if there is a net issue or API server downtime issue, the operator would not retry until there is another event 🤷

@eguzki
Copy link
Contributor Author

eguzki commented Feb 27, 2025

There is an issue with the hardcoded namespaces of istio and envoygateway. The namespace of those control planes cannot be assumed to be those hardcoded. I will address this issue in a follow up PR

@eguzki
Copy link
Contributor Author

eguzki commented Feb 27, 2025

There is an issue with the lifecycle of the monitors when a gateway is deleted. The monitoring resource does not get deleted. I will address this issue in a follow up PR

@eguzki eguzki force-pushed the refactor-observability-controller branch from 686b416 to 91df7cc Compare February 27, 2025 15:58
@eguzki eguzki marked this pull request as ready for review February 27, 2025 15:58
@eguzki eguzki requested a review from a team as a code owner February 27, 2025 15:58
Signed-off-by: Eguzki Astiz Lezaun <[email protected]>
@eguzki eguzki force-pushed the refactor-observability-controller branch from 91df7cc to 2c43a4f Compare February 28, 2025 11:02
@Boomatang
Copy link
Contributor

There is an issue with the lifecycle of the monitors when a gateway is deleted. The monitoring resource does not get deleted. I will address this issue in a follow up PR

@eguzki is that an exist issue?

@Patryk-Stefanski
Copy link

@Boomatang, I created the follow-up issues below. They seem like good first issues that I could get started with, @eguzki could we sync on these when you're back?

@eguzki
Copy link
Contributor Author

eguzki commented Mar 11, 2025

uld we sync on these when you're

Yes, those are good first issues. Send invite and we sync.

@Patryk-Stefanski
Copy link

The verification steps were successful. Turning the observability option on and off created and deleted the monitoring resources as expected. /lgtm

@eguzki
Copy link
Contributor Author

eguzki commented Mar 13, 2025

kindly asking for review @david-martin @Patryk-Stefanski

@eguzki eguzki added this pull request to the merge queue Mar 14, 2025
Merged via the queue into main with commit 33b9676 Mar 14, 2025
41 checks passed
@eguzki eguzki deleted the refactor-observability-controller branch March 14, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants