-
Notifications
You must be signed in to change notification settings - Fork 160
[8.17](backport #6623) [k8s] Fix logical race conditions in kubernetes_secrets provider #6795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* fix: refactor kubernetes_secrets provider to eliminate race conditions * fix: add changelog fragment and unit-tests for kubernetes_secrets provider * fix: replace RWMutex with Mutex * fix: rename newExpirationStore to newExpirationCache * fix: introduce kubernetes_secrets provider name as a const * fix: extend AddConditionally doc string to describe the case of condition is nil * fix: gosec lint (cherry picked from commit 6d4b91c) # Conflicts: # internal/pkg/composable/providers/kubernetessecrets/kubernetes_secrets.go
Cherry-pick of 6d4b91c has failed:
To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
… appropriate only for 8.17.x
e04c85f
to
b96e692
Compare
|
What does this PR do?
This PR refactors the implementation to eliminate logical race conditions of the
kubernetes_secrets
provider alongside with brand new unit-tests.Initially, the issue seemed to stem from misuse or lack of synchronisation primitives, but after deeper analysis, it became evident that the "race" conditions were logical rather than concurrency-related. The existing implementation was structured in a way that led to inconsistencies due to overlapping responsibilities of different actors managing the secret lifecycle.
To address this, I restructured the logic while keeping in mind the constraints of the existing provider, specifically:
With this in mind, the provider behaviour is now as follows:
Cache Disabled Mode:
Cache Enabled Mode:
Cache Actor Responsibilities:
lastAccess
:lastAccess
to prevent premature expiration and give the fetch actor time to pick up the new value.lastAccess
and let the entry "age" as it should.Fetch Actor Responsibilities:
lastAccess
when an entry is accessed to prevent unintended expiration.Considerations:
ExpirationCache
fromk8s.io/client-go/tools/cache
does not suit our needs, as it lacks the aforementioned conditional insertion required to handle these interactions correctly.PS: as the main changes of this PR are captured by the commit a549728, I consider this PR to be aligned with the Pull Requests policy
Why is it important?
This refactor significantly improves the correctness of the
kubernetes_secrets
provider by ensuring:Checklist
./changelog/fragments
using the change-log toolDisruptive User Impact
This change does not introduce breaking changes but ensures that the
kubernetes_secrets
provider operates correctly in cache-enabled mode. Users relying on cache behaviour may notice improved stability in secret retrieval.How to test this PR locally
go test ./internal/pkg/composable/providers/kubernetessecrets/...
Related issues
This is an automatic backport of pull request #6623 done by [Mergify](https://mergify.com).