-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Prometheus] Add ray_cluster_provisioned_duration_seconds
metric
#3212
base: master
Are you sure you want to change the base?
[Prometheus] Add ray_cluster_provisioned_duration_seconds
metric
#3212
Conversation
@troychiu PTAL, thx. |
a10273a
to
3b66a69
Compare
Help: "The time from RayClusters created to all ray pods are ready for the first time (RayClusterProvisioned) in seconds", | ||
// It may not be applicable to all users, but default buckets cannot be used either. | ||
// For reference, see: https://github.com/prometheus/client_golang/blob/331dfab0cc853dca0242a0d96a80184087a80c1d/prometheus/histogram.go#L271 | ||
Buckets: []float64{30, 60, 120, 180, 240, 300, 600, 900, 1800, 3600}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what bucket ranges would be suitable for most users.
5a481f0
to
27d38a5
Compare
Signed-off-by: win5923 <[email protected]>
27d38a5
to
a009b43
Compare
cc @troychiu let's prioritize reviewing this PR. |
This PR is a follow-up to #3310. |
@@ -1336,6 +1333,11 @@ func (r *RayClusterReconciler) calculateStatus(ctx context.Context, instance *ra | |||
Reason: rayv1.AllPodRunningAndReadyFirstTime, | |||
Message: "All Ray Pods are ready for the first time", | |||
}) | |||
|
|||
// Record ray_cluster_provisioned_duration_seconds duration metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric should not be recorded in calculateStatus
. It should only be recorded when the status update succeeds, to avoid counting it more than once.
Why are these changes needed?
Add
ray_cluster_provisioned_duration_seconds
metric to track the time from RayClusters created to all ray pods are ready for the first time (RayClusterProvisioned).Manual test:
Related issue number
Closes #3172
Checks