-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THREESCALE-11395 Improved secret watched-by logic #1030
Conversation
1b639f9
to
fad7348
Compare
786f03d
to
1f3285e
Compare
|
||
* [**recommended way**] Create another secret with a different name and update the APIManager custom resource field `customEnvironments[].secretRef.name`. The operator will trigger a rolling update loading the new custom environment content. | ||
* Update the existing secret content and redeploy apicast turning `spec.apicast.productionSpec.replicas` or `spec.apicast.stagingSpec.replicas` to 0 and then back to the previous value. | ||
**NOTE**: If the referenced secret does not exist, the operator will mark the APIManager CustomResource as failed. The apicast Deployment object will also fail if the referenced secret does not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @carlkyrillos . I did test as in Description and all looks fine. Code review is not completed, but meanwhile looks very good. Just this behavior - I'm not sure. I tried delete secret, but apimanager CR was not updated, althouh reconciler reporting error.
- Pods status - all UP and running ok
- Delete secret:
oc delete secret custom-env-1
- Error in reconciler:
2024-11-26T15:54:30+02:00 ERROR Reconciler error {"controller": "apimanager", "controllerGroup": "apps.3scale.net", "controllerKind": "APIManager", "APIManager": {"name":"3scale","namespace":"3scale-test"}, "namespace": "3scale-test", "name": "3scale", "reconcileID": "151a6cea-64bd-40cd-8f1b-cf8ad6be102c", "error": "spec.apicast.productionSpec.customEnvironments[0]: Invalid value: v1alpha1.CustomEnvironmentSpec{SecretRef:(*v1.LocalObjectReference)(0x14000f8f7e0)}: Secret \"custom-env-1\" not found", "errorCauses": [{"error": "spec.apicast.productionSpec.customEnvironments[0]: Invalid value: v1alpha1.CustomEnvironmentSpec{SecretRef:(*v1.LocalObjectReference)(0x14000f8f7e0)}: Secret \"custom-env-1\" not found"}]}
- No error in CR:
~/go/3scale-operator oc describe apimanager 3scale |grep -i error
~/go/3scale-operator oc describe apimanager 3scale |grep -i warn
~/go/3scale-operator oc describe apimanager 3scale |grep Status
Status:
Status: True
Status: True
~/go/3scale-operator oc describe apimanager 3scale |grep Message
Message: All requirements for the current version are met
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valerymo You are right this is a bug, thank you for catching this. I'll take a look at the code and will push a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MStokluska After thinking about this some more I'm second guessing my original logic. Are we sure we want to fail the APIManager CR if a secret referenced in the .spec
is missing? Would it be better to just fail the deployment that is using the watched secret? That way we wouldn't be holding up the entire installation, just the one component relying on the watched secret. CC: @valerymo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understand correctly, if a secret referenced missing in APIM we are setting APIM as not ready?
If so, I see no objection in setting apim via not ready deployment to not ready as well. Especially that APIM doesn't log "lastError". But I have not looked at the PR so I might be missing some context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MStokluska @valerymo I pushed a commit that checks to see if any of the watched secrets can't be found. If any watched secrets are missing, then the Available
Condition in the APIManager CR's .status
will switch to false
and I added a message to the condition that lists which secrets couldn't be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlkyrillos Retest done after commit 6 (latest).
-
I retested all as described in Validation notes, and it looks fine, but i did few additional tests for commit 6, as below, and there are few questions, can you please look.
Test 1
- 3scale - is running (as described in validation/description), looks good.
- Delete both custom secrets:
oc delete secret custom-env-1 custom-policy-1
- Check Apimanager CR and deployments :
- Message appears in apim CR:
Message: The following secret(s) could not be found: custom-env-1, custom-policy-1
. This is looks good. - All deployments - UP, apicast pods - are not recreated (?). I'm not sure if it's ok, but wanted to ask you. As even secrets were removed - apicast pods are running without restart, and so have references to non existing secrets.
- Message appears in apim CR:
Test 2
- install 3scale from scratch. Secrets
custom-env-1
andcustom-policy-1
are missing, but defined in apimanager CR for APIcast. - Check installtion:
- Both apicast deployments/pods are missing (?)
- Preflight appears as successfull (?) although apicast deployments missing
- Apimanager has Message thet secrets are missing (this is fine)
Carl, could you please take a look. I'm not entirely sure of some logic/behavior of operator, that we see in Tests 1 & 2. Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valerymo Thanks for re-verifying the PR. For Test 2
, this is expected behavior: the apicast pods won't start if they're relying on a secret that doesn't exist and as long as the APIManager has a message that the secrets are missing we should be fine in that scenario.
As for Test 1
that may or may not be a problem. Since we're failing the APIManager when the secrets are deleted, the user should be aware that there is a problem even if the apicast Deployments are still healthy. @MStokluska do we want to also fail the apicast Deployments in this scenario - i.e. when the apicast deployments were originally created the referenced custom secrets did exist but then the secrets were deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud..the way I would see this / expect this work is as follows:
- When I reference a secret in APIM and the secret DOES NOT EXIST - the APIM should fail but deployment should continue as it is, just without the secret
- When I reference a secret in APIM and I remove the secret - Ideally, I'd like a finalizer to block me from doing so - the reason is because my current installation depends on that secret. And yes, although removing the secret won't have an immediate effect on the deployment (as in, it will still work with the secret mounted) - IMO we should not be breaking the self-healing mechanism and prevent breaking it when possible. Since operator watches the secret, it could also set a finalizer on it to ensure the secret isn't removed accidentally.
Alternative approach IMO should be shutting down the deployment if it relies on secret that was deleted. Reason for this IMO is it's better to fail immediately when issue occurs rather than giving false sense of correct configuration...
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to your first bullet point: I need to double check because it's been awhile since I tested but I believe the PR is behaving in the way you're describing - so we should be set there.
Responding to your second bullet: I think adding a finalizer to the secret is the cleanest solution but do we want to add the finalizer to any user-created secret that is specifiied in the APIManager .spec or only to secrets that are specifiecd in the .spec and have the watched-by
label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the second point - IMO, the finalizer should be added to watched-by only.
} | ||
|
||
for key, val := range apicast.Options.ProductionAdditionalPodAnnotations { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we allow to specify annotations via apim?
If so, are we removing this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we do allow users to specify annotations that get injected into the Pod but not into the Deployment. The watched secret annotations get injected into the Deployment but not the Pod so there's no risk of them overwriting one another. So we can keep the existing functionality without changes.
@@ -129,9 +129,6 @@ type ApicastOptions struct { | |||
|
|||
ProductionServiceCacheSize *int32 | |||
StagingServiceCacheSize *int32 | |||
|
|||
StagingAdditionalPodAnnotations map[string]string `validate:"required"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we do allow users to specify annotations that get injected into the Pod but not into the Deployment. The watched secret annotations get injected into the Deployment but not the Pod so there's no risk of them overwriting one another. So we can keep the existing functionality without changes.
55b3bb3
to
b9a2d2e
Compare
Issue Link
JIRA: THREESCALE-11395
What
This PR improves on the existing secret watched-by logic in a few ways:
Previously the secret labels on the APIManager CR followed this pattern:
secret.apimanager.apps.3scale.net/{secret-UID: 'true'
, where the value was set totrue
regardless of whether the secret had thewatched-by
label. Now the value is set tofalse
if the Secret doesn't have thewatched-by
label andtrue
if the secret does have thewatched-by
label.Previously the operator would annotate the apicast pod(s) with
apimanager.apps.3scale.net/{secret-name}: '{secret-resourceVersion}'
whenever a secret was referenced in the APIManager CR. Now the apicast pod(s) are only annotated if the referenced secret also has thewatched-by
label on it.Previously, any and all changes to a watched secret would trigger the apicast deployment(s) to rollout a new pod(s) whose annotations contained the latest resourceVersion of the watched secret. Now the operator will only rollout new pods when there is a change to the secret's
.data
, i.e. changes to the watched secrets labels, annotations, etc. won't trigger a rollout even if though the secret's resourceVersion changed. This is accomplished through a new secret calledhashed-secret-data
that stores a hash of each watched secret's data using SHA256 encryption. Whenever the operator detects that a watched secret's resourceVersion has changed, it first takes a hash of the secret's current.data
and if that matches the secret's entry in thehashed-secret-data
secret, then the operator will ignore the change and prevent a rollout. If the.data
has changed, the operator will update the apicast pod's annotations with the watched secret's latest resourceVersion which will trigger a rollout.NOTE: This PR doesn't implement support to watch TLS or ACL secrets. Support to watch these secrets will be added in a followup PR once #1025 and #1026 are merged.
Verification Steps
custom-policy-1
andcustom-env-1
secrets:custom-env-1
label value set totrue
and thecustom-policy-1
label value set tofalse
:The labels should look like this:
apicast-staging
andapicast-productions
pods' annotations have a references to thecustom-env-1
secret but not thecustom-policy-1
secret:The annotations for both pods should look like this:
NOTE: The
env-configmap-hash
annotation is supposed to be there and is not related to the watched-by secrets.hashed-secret-data
secret exists but only has an entry for thecustom-env-1
secret:watched-by
label to thecustom-policy-1
secret:true
:The labels should now look like this:
apicast-staging
andapicast-production
pods are ready, verify that they have annotations with references to both thecustom-env-1
secret and thecustom-policy-1
secret:The annotations should look like this:
NOTE: The
env-configmap-hash
annotation is supposed to be there and is not related to the watched-by secrets.hashed-secret-data
secret has an entry for both thecustom-env-1
andcustom-policy-1
secrets:apicast-staging
andapicast-production
pods are created: