Skip to content

Most updated CustomResource status must be passed in each reconciliation loop #2765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
afalhambra-hivemq opened this issue Apr 11, 2025 · 7 comments · May be fixed by #2761
Open

Most updated CustomResource status must be passed in each reconciliation loop #2765

afalhambra-hivemq opened this issue Apr 11, 2025 · 7 comments · May be fixed by #2761

Comments

@afalhambra-hivemq
Copy link

afalhambra-hivemq commented Apr 11, 2025

Is your feature request related to a problem?
We are currently in the middle of migrating from JOSDK 4 to JOSDK 5.
For us it's always crucial that the CustomResource status passed to the main reconciliation loop is the most up-to-date one.
As a best practice, our operator does not manipulate/modify anything in the CustomResource other than the status, which is afterwards used in the next reconciliations to potentially trigger a particular action to act upon based on that status (such as rolling restart of the pods for example).
So what we have just noticed is that when migrating from the JOSDK 4 to JOSDK 5, this is no longer the case as we have experienced so far that sometimes this CustomResource status is outdated. Apparently there might be some informers that triggered the reconciliation loop with an old version of the CustomResource status in the cache.
And it seems that this was handled and cached internally in the JOSDK 4 here, but got removed in the JOSDK 5.

Describe the solution you'd like
I guess, there might be a couple of possible options here:

  1. JOSDK 5 to provide with an new API option so we can for instance decide whether we want to update the underlying primary cache with the most up-to-date CustomResource status. For example
UpdateControl.forceCacheUpdate().patchStatus(platform); <-- only updates the status in cache
UpdateControl.forceCacheUpdate().patchResource(platform); <-- only updates the spec in cache
UpdateControl.forceCacheUpdate().patchResourceAndStatus(platform) <-- updates both in cache 
  1. An option to enable an internal caching mechanism similarly to what was done in the JOSDK 4.

Describe alternatives you've considered
Since our operator only modify the CustomResource status, at the moment we are just updating the primary cache with the CustomResource status change in every reconciliation loop (nothing else, no Metadata, no Spec change updated in this cache).

    private static void updateCache(
            MyCustomResource primary,
            Context<MyCustomResource> context) {
        final var cache = context.getPrimaryCache();
        cache.get(new ResourceID(primary.getMetadata().getName(), primary.getMetadata().getNamespace()))
                .ifPresent(r -> r.setStatus(primary.getStatus()));
    }

Additional context
N/A.

@afalhambra-hivemq afalhambra-hivemq changed the title Provide support for keeping the most up-to-date CustomResource status passed in each reconciliation loop Provide support for keeping the most updated CustomResource status passed in each reconciliation loop Apr 11, 2025
@afalhambra-hivemq afalhambra-hivemq changed the title Provide support for keeping the most updated CustomResource status passed in each reconciliation loop Either keep the most updated CR status passed in each reconciliation loop or allow to update the primary cache with updated CR status Apr 11, 2025
@afalhambra-hivemq afalhambra-hivemq changed the title Either keep the most updated CR status passed in each reconciliation loop or allow to update the primary cache with updated CR status Most updated CustomResource status must be passed in each reconciliation loop Apr 12, 2025
@csviri
Copy link
Collaborator

csviri commented Apr 14, 2025

Hi @afalhambra-hivemq , thx for creating an issue for this, it is already a duplicate, see issue for this: #2746

Just some remarks:

  1. We will consider putting back UpdateControl.updateStatus, that will specially for this case, to use PUT / Update method of Kubernetes API with optimistic locking. In this case, it is possible to do an overlay cache and pass the resource to the reconcile on the next reconciliation. We will also do some additional examples and supporting classes to showcase how to do status caching explicitly.

Something like this: UpdateControl.forceCacheUpdate().patchStatus(platform); is not possible in general. Having properly and deterministically caching resources (without parsing the resourceVersion, thus bending the contract of k8s API) is not possible, and I would not like to bake it to the core.

  1. You should not do this in general, changing explicitly the cache can lead to all kind of errors/issues with Informers. (Note that cache is the cache of the informer, and mean to be read only). Although changing the status might work, just in general it is not safe to do such updates.

see also: https://javaoperatorsdk.io/docs/faq/#im-managing-an-external-resource-that-has-a-generated-id-where-should-i-store-that

@csviri
Copy link
Collaborator

csviri commented Apr 14, 2025

Also feel free to join our community meeting tomorrow, 15:00 CEST where we will discuss this topic.

@afalhambra-hivemq
Copy link
Author

Thanks @csviri - I will join. Will that be via Zoom, if not mistaken, as documented here

@csviri
Copy link
Collaborator

csviri commented Apr 14, 2025

Thanks @csviri - I will join. Will that be via Zoom, if not mistaken, as documented here

Exactly

@afalhambra-hivemq
Copy link
Author

We will consider putting back UpdateControl.updateStatus, that will specially for this case, to use PUT / Update method of Kubernetes API with optimistic locking.

Using optimistic locking is already problematic for us. We have an edge case, with a race condition where the CR is updated at the same time twice, so an optimistic locking will fail straight away because of the resourceVersion being different.
If we disable the optimistic locking then the patching works fine for this race condition.

If I'm not mistaken, the default retry mechanism didn't work here either, cause the status of the CR there was outdated as well but I need to double check this.

@csviri
Copy link
Collaborator

csviri commented Apr 14, 2025

You can alyway store the response form the first update, and do the second update using that resource.

@csviri csviri linked a pull request Apr 14, 2025 that will close this issue
@afalhambra-hivemq
Copy link
Author

You can alyway store the response form the first update, and do the second update using that resource.

I guess, this would mean to add a custom retry mechanism in place for that? Haven't checked out further in detail to see how this is done by the default retry mechanism by the JOSDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants