Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness probe #365

Merged
merged 3 commits into from
Nov 25, 2022
Merged

Readiness probe #365

merged 3 commits into from
Nov 25, 2022

Conversation

guicassolato
Copy link
Collaborator

@guicassolato guicassolato commented Nov 18, 2022

Implements health and readiness probe endpoints for the controllers. ATM, the endpoints are trivial readiness and health check endpoints the by default always return "ok".

The aggregated status of all AuthConfigs watched by the controller is an opt-in check.

New endpoints introduced:

  • /healthy: Health probe (ping)
  • /readyz: Aggregated readiness probe (by default, none)
  • /readyz/authconfigs: Aggregated status of all AuthConfigs watched by the controller

In general, all endpoints return either 200 ("ok") or 500 (when 1+ probes fail).

The default binding network address is :8081. It can be changed using a newly introduced flag (command-line arg) --health-probe-addr.

The following query string parameters are supported:

  • verbose=any: more verbose response messages;
  • include=authconfigs: to include the aggregated status of AuthConfigs in the response;
  • exclude=(check name): to exclude a particular probe - has no effect when passed on requests to /readyz/authconfigs.

Closes #355

Verification steps

$ make local-setup FF=1
$ kubectl port-forward deployment/authorino 8081:8081 &
$ curl http://localhost:8081/healthz
ok
$ curl http://localhost:8081/readyz
ok
$ kubectl apply -f -<<EOF
apiVersion: authorino.kuadrant.io/v1beta1
kind: AuthConfig
metadata:
  name: talker-api-protection
spec:
  hosts:
  - talker-api-authorino.127.0.0.1.nip.io
EOF
authconfig.authorino.kuadrant.io/talker-api-protection created
$ curl http://localhost:8081/readyz
ok
$ curl "http://localhost:8081/readyz?verbose=true"
[+]authconfigs ok
healthz check passed
$ kubectl apply -f -<<EOF
apiVersion: authorino.kuadrant.io/v1beta1
kind: AuthConfig
metadata:
  name: talker-api-protection-2
spec:
  hosts:
  - talker-api-authorino.127.0.0.1.nip.io
EOF
authconfig.authorino.kuadrant.io/talker-api-protection-2 created
$ curl http://localhost:8081/readyz
ok
$ curl "http://localhost:8081/readyz?include=authconfigs"
[-]authconfigs failed: reason withheld
healthz check failed
$ curl http://localhost:8081/readyz/authconfigs
internal server error: authconfig is not ready: default/talker-api-protection-2 (reason: HostsNotLinked)

Implements health and readiness probe endpoints for the controllers, reporting particularly the aggregated state of the AuthConfigs.

New endpoints:
- `/healthy`: Health probe (ping)
- `/readyz`: Aggregated readiness probe (only AuthConfig reconciler currently reporting)
- `/readyz/authconfigs`: Aggregated status of the AuthConfigs

The default binding network address is `:8081`. It can be changed using the newly introduced flag (command-line arg) `--health-probe-addr`.

The endpoints return either `200` ("ok") or `500` when 1+ probes fail.

The query string parameters `verbose=true` and `exclude=authconfigs` are supported respectively to provide more verbose responses and exclude a particular probe ("authconfigs" in the example provided).

Closes #355
@guicassolato guicassolato self-assigned this Nov 18, 2022
@guicassolato guicassolato requested a review from a team November 18, 2022 12:24
@guicassolato guicassolato mentioned this pull request Nov 21, 2022
Merged
@guicassolato guicassolato marked this pull request as draft November 22, 2022 15:16
@guicassolato
Copy link
Collaborator Author

It turns out this approach can be improved to solve the issue for the readiness probe.

An AuthConfig being not ready (due to a host collision, for example) has nothing to do with the AuthConfig Reconciler (or even less with the Authorino instance!) not being ready. Exposing this endpoint through controller-runtime's readiness check mixes those two things together and introduces a risk of occasionally shutting the entire Authorino instance off because of a unexceptional situation of having one or more AuthConfigs not ready.

This is because one could think that the /readyz endpoint is suitable for configuring the readinessProbe of the Authorino Deployment, when in fact, as-is, it is not. Currently, this is not done by the Authorino Operator, but when it starts doing so, this endpoint cannot be used indiscriminately. To use this endpoint, the authconfigs check would have to be excluded by supplying in the request the exclude=authconfigs parameter.

Te check the aggregated readiness status of all AuthConfigs watched by the reconciler, any of the following endpoints can be used:
- `/readyz?include=authconfigs` - all readiness checks + authconfigs
- `/readyz/authconfigs` - only the authconfigs readiness check
@guicassolato guicassolato marked this pull request as ready for review November 23, 2022 16:07
@guicassolato
Copy link
Collaborator Author

This is because one could think that the /readyz endpoint is suitable for configuring the readinessProbe of the Authorino Deployment, when in fact, as-is, it is not.

No longer true.

I've changed so the authconfigs check is now opt-in. Therefore people can use the upper level (aggregated) /readyz endpoint to set readiness probe for the Authorino Deployment without risking the status of AuthConfig CRs to affect the state of the service itself.

One who wants to include the readiness state of the AuthConfigs in the overall aggregated readiness probe check can call /readyz?include=authconfigs (not suitable for setting readiness check for the Authorino Deployment).

Alternatively, only the aggregated status of the AuthConfig CRs watched by that particular Authorino instance can be checked by calling /readyz/authconfigs.

@@ -30,3 +30,12 @@ func SubtractSlice(sl1, sl2 []string) []string {
}
return diff
}

func SliceContains[T comparable](s []T, val T) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I have to learn Golang generics 🥳

Copy link
Member

@didierofrivia didierofrivia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! The only issue I see is the one reported by Eguz regarding returning the map of statuses

@guicassolato guicassolato merged commit 90dd08f into main Nov 25, 2022
@guicassolato guicassolato deleted the controller-readiness-probe branch November 25, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Readiness Probe
3 participants