Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webhook: Admission metrics are not reported by status code #2972

Open
ahmetb opened this issue Oct 9, 2024 · 5 comments
Open

webhook: Admission metrics are not reported by status code #2972

ahmetb opened this issue Oct 9, 2024 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@ahmetb
Copy link
Member

ahmetb commented Oct 9, 2024

This is several bugs in one bug since it appears multiple things are broken. Apologies if it gets too complicated.

Problem 1: It appears that controller-runtime webhooks always respond with HTTP status code 200 OK even if the request is completely malformed.

Example (sending a malformed request to a webhook endpoint):

curl -k -v https://localhost:9443/validate-ship-example-org-v1beta1-frigate
< HTTP/2 200
< ... other response headers ...
<
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"contentType=, expected application/json","code":400}}}

Problem 2: controller-runtime webhooks do not use the metav1.Status.Code and always respond with HTTP status code 200 OK.

Example code I used to register my custom handler at /test:

webhookServer.Register("/test", &webhook.Admission{Handler: &myHandler{}})
...

type myHandler struct{}
func (*myHandler) Handle(context.Context, admission.Request) admission.Response {
	return admission.Response{
		AdmissionResponse: admissionv1.AdmissionResponse{
			Allowed: false,
			Result: &metav1.Status{
				Code:    403,
				Message: "things suck",
				Status:  "Failure",
				Reason:  metav1.StatusReasonForbidden}}}
}

And here's the query I am making:

curl -XPOST -k -v -H "Content-Type: application/json" https://localhost:9443/test

< HTTP/2 200
...
<
{"kind":"AdmissionReview","apiVersion":"admission.k8s.io/v1","response":{"uid":"","allowed":false,"status":{"metadata":{},"status":"Failure","message":"things suck","reason":"Forbidden","code":403}}}

Problem 3: controller-runtime doesn't expose any metrics about rejected requests and their .status.codes. This is likely because

  1. we always report code=200 no matter the response (Problem 1 & 2)
  2. it appears we're capping controller_runtime_webhook_requests_total{code=...} dimension to only 200 and 500 here:
    lat := RequestLatency.MustCurryWith(lbl)
    cnt := RequestTotal.MustCurryWith(lbl)
    gge := RequestInFlight.With(lbl)
    // Initialize the most likely HTTP status codes.
    cnt.WithLabelValues("200")
    cnt.WithLabelValues("500")

so even if I make a faulty request, only the code=200 metric goes up:

controller_runtime_webhook_requests_total{code="200",webhook="/test"} 4

In an ideal state, I'd like to see controller_runtime_webhook_requests_total metric actually let me understand things like how many requests I'm denying (non-200 codes, such as 403, 400, 429, ...).

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 9, 2024
@troy0820
Copy link
Member

I believe the 200 response is if the webhook can create the response. There was an issue around this with failure policy and not sure if this helps with your issue #2662.

@alvaroaleman
Copy link
Member

Pretty much what troy said. I do agree that this is confusing and that it would be helpful to have metrics around the status code in the admission response.

@ahmetb
Copy link
Member Author

ahmetb commented Oct 15, 2024

Indeed, after creating the issue I've realized HTTP response code must be 200 regardless of the admission response (which contains the status code the apiserver would return in its response body).

Maybe we could still add a metric that parses the code out from admission.Response? (I understand we can't help if someone actually decides to implement their own http.Handler).

@ahmetb
Copy link
Member Author

ahmetb commented Oct 16, 2024

/remove-kind bug
/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 16, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants