Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prometheus] Add ray_services_ready_duration_seconds metric #3261

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

win5923
Copy link
Contributor

@win5923 win5923 commented Apr 1, 2025

Why are these changes needed?

Add ray_services_ready_duration_seconds metric to track The time between RayServices created to ready (created to RayServiceReady)

Manual test

$ k apply -f config/samples/ray-service.sample.yaml

$ echo $(( $(date -d "$(kubectl get rayservice rayservice-sample -o jsonpath='{.status.conditions[?(@.type=="Ready")].lastTransitionTime}')" +%s) - $(date -d "$(kubectl get rayservice rayservice-sample -o jsonpath='{.metadata.creationTimestamp}')" +%s) )) seconds
62 seconds

$ k apply -f config/samples/ray-service.sample.yaml -n test

$ echo $(( $(date -d "$(kubectl get rayservice rayservice-sample -n test -o jsonpath='{.status.conditions[?(@.type=="Ready")].lastTransitionTime}')" +%s) - $(date -d "$(kubectl get rayservice rayservice-sample -n test -o jsonpath='{.metadata.creationTimestamp}')" +%s) )) seconds
32 seconds

$ k apply -f config/samples/ray-service.sample-2.yaml

$ echo $(( $(date -d "$(kubectl get rayservice rayservice-sample-2 -o jsonpath='{.status.conditions[?(@.type=="Ready")].lastTransitionTime}')" +%s) - $(date -d "$(kubectl get rayservice rayservice-sample-2 -o jsonpath='{.metadata.creationTimestamp}')" +%s) )) seconds
32 seconds

image

Related issue number

Closes #3177

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Comment on lines +214 to +219
// Record ray_services_ready_duration_seconds metric if the current RayService is ready and the previous one is not.
if !meta.IsStatusConditionTrue(originalRayServiceInstance.Status.Conditions, string(rayv1.RayServiceReady)) &&
meta.IsStatusConditionTrue(rayServiceInstance.Status.Conditions, string(rayv1.RayServiceReady)) {
readyDuration := time.Since(rayServiceInstance.CreationTimestamp.Time)
common.ObserveRayServicesReadyDuration(rayServiceInstance.Namespace, readyDuration)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recorded the metric here because the following issue causes the controller runtime to reconcile again. This results in the same RayService metric being recorded twice.

{"level":"info","ts":"2025-04-01T12:12:55.567Z","logger":"controllers.RayService","msg":"Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler","RayService":{"name":"rayservice-sample","namespace":"default"},"reconcileID":"88aadcdd-dfa4-49b8-b290-ffb5d8b06ba0"}
{"level":"error","ts":"2025-04-01T12:12:55.568Z","logger":"controllers.RayService","msg":"Reconciler error","RayService":{"name":"rayservice-sample","namespace":"default"},"reconcileID":"88aadcdd-dfa4-49b8-b290-ffb5d8b06ba0","error":"Operation cannot be fulfilled on rayservices.ray.io \"rayservice-sample\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/ubuntu/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"}

@kevin85421
Copy link
Member

could you rebase with the master branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][metrics] ray_services_ready_duration_seconds
2 participants