Skip to content

fix: invalidate cluster cache after sync operations #745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

pjiang-dev
Copy link
Contributor

@pjiang-dev pjiang-dev commented Jul 10, 2025

fixes argoproj/argo-cd#20458

There have been several times after a sync that the diff shows the live resource before the sync, causing incorrect diffs. This change will ensure if there is a sync with a valid task (non no-op) then we invalidate the cache. We only invalidate cache for resources that were modified to save on operational cost
This is guaranteeing fresh data for subsequent diff calculations

This could also help/fix an issue with CRDs as previously after a CRD schema changes and is synced, we run into a diff error because diff is using stale schema

Copy link

codecov bot commented Jul 10, 2025

Codecov Report

❌ Patch coverage is 47.54098% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.27%. Comparing base (8849c3f) to head (6342d8f).
⚠️ Report is 55 commits behind head on master.

Files with missing lines Patch % Lines
pkg/cache/cluster.go 0.00% 16 Missing ⚠️
pkg/engine/engine.go 0.00% 13 Missing ⚠️
pkg/sync/sync_context.go 90.62% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #745      +/-   ##
==========================================
- Coverage   54.26%   47.27%   -7.00%     
==========================================
  Files          64       64              
  Lines        6164     6589     +425     
==========================================
- Hits         3345     3115     -230     
- Misses       2549     3218     +669     
+ Partials      270      256      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comments.

@pjiang-dev pjiang-dev marked this pull request as ready for review July 11, 2025 16:28
@pjiang-dev pjiang-dev requested a review from a team as a code owner July 11, 2025 16:28
@pjiang-dev pjiang-dev force-pushed the pjiang/sync-invalidate-cache branch from f303abe to 6342d8f Compare July 29, 2025 21:06
Copy link

Copy link
Contributor

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comments

@@ -74,6 +74,13 @@ func (e *gitOpsEngine) Sync(ctx context.Context,
namespace string,
opts ...sync.SyncOpt,
) ([]common.ResourceSyncResult, error) {
// Ensure cache is synced before getting managed live objects
// This forces a refresh if the cache was invalidated
err := e.cache.EnsureSynced()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this function was designed to be called just when the gitops engine is being initialized. This will call the clusterCacheSync.synced function that will just validate on the sync time. The ClusterCache should always be synced because it updates based on resource watches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after looking at argo-cd code more it seems like GetManagedLiveObjs() actually calls c.getSyncedCluster(destCluster) which eventually calls clusterCache.EnsureSynced() anyway.

So this part is definitely not needed

Comment on lines +96 to +101
opts = append(opts, sync.WithCacheInvalidationCallback(func(modifiedResources []kube.ResourceKey) {
// Only invalidate the specific resources that were modified
if len(modifiedResources) > 0 {
e.cache.InvalidateResources(modifiedResources)
}
}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need to invalidate the cache on every sync if the cluster cache is based on resource watches? I feel that the root problem is elsewhere. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems like this solution is reactively removing cached resources and is more of a workaround.

The problem is more likely in watchEvents not properly updating resources like CRDs schema changes.

Comment on lines +573 to +580
// Invalidate cache after successful sync
if sc.cacheInvalidationCallback != nil {
modifiedResources := make([]kubeutil.ResourceKey, 0, len(sc.syncRes))
for _, result := range sc.syncRes {
modifiedResources = append(modifiedResources, result.ResourceKey)
}
sc.cacheInvalidationCallback(modifiedResources)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment applies:

Why would we need to invalidate the cache on every sync if the cluster cache is based on resource watches? I feel that the root problem is elsewhere. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ServerSide Diff failing for CRs when a new field is added to the CRD: field not declared in schema
3 participants