THREESCALE-11395 Improved secret watched-by logic #1030

carlkyrillos · 2024-11-05T18:42:00Z

Issue Link

What

This PR improves on the existing secret watched-by logic in a few ways:

Previously the secret labels on the APIManager CR followed this pattern: secret.apimanager.apps.3scale.net/{secret-UID: 'true', where the value was set to true regardless of whether the secret had the watched-by label. Now the value is set to false if the Secret doesn't have the watched-by label and true if the secret does have the watched-by label.
Previously the operator would annotate the apicast pod(s) with apimanager.apps.3scale.net/{secret-name}: '{secret-resourceVersion}' whenever a secret was referenced in the APIManager CR. Now the apicast pod(s) are only annotated if the referenced secret also has the watched-by label on it.
Previously, any and all changes to a watched secret would trigger the apicast deployment(s) to rollout a new pod(s) whose annotations contained the latest resourceVersion of the watched secret. Now the operator will only rollout new pods when there is a change to the secret's .data, i.e. changes to the watched secrets labels, annotations, etc. won't trigger a rollout even if though the secret's resourceVersion changed. This is accomplished through a new secret called hashed-secret-data that stores a hash of each watched secret's data using SHA256 encryption. Whenever the operator detects that a watched secret's resourceVersion has changed, it first takes a hash of the secret's current .data and if that matches the secret's entry in the hashed-secret-data secret, then the operator will ignore the change and prevent a rollout. If the .data has changed, the operator will update the apicast pod's annotations with the watched secret's latest resourceVersion which will trigger a rollout.

NOTE: This PR doesn't implement support to watch TLS or ACL secrets. Support to watch these secrets will be added in a followup PR once #1025 and #1026 are merged.

Verification Steps

Checkout this PR
Prepare the cluster for a local install:

make download
make install

export NAMESPACE=3scale-test
oc new-project $NAMESPACE

cat << EOF | oc create -f -
kind: Secret
apiVersion: v1
metadata:
  name: s3-credentials
  namespace: $NAMESPACE
data:
  AWS_ACCESS_KEY_ID: c29tZXRoaW5nCg==
  AWS_BUCKET: c29tZXRoaW5nCg==
  AWS_REGION: dXMtd2VzdC0xCg==
  AWS_SECRET_ACCESS_KEY: c29tZXRoaW5nCg==
type: Opaque
EOF

Create a custom policy secret (without the watched-by label):

cat << EOF | oc create -f -
apiVersion: v1
kind: Secret
metadata:
  name: custom-policy-1
  namespace: $NAMESPACE
type: Opaque
stringData:
  apicast-policy.json: |
    {
      "services": [
        {
          "proxy": {
            "policy_chain": [
              { "name": "apicast.policy.upstream",
                "configuration": {
                  "rules": [{
                    "regex": "/",
                    "url": "http://echo-api.3scale.net"
                  }]
                }
              }
            ]
          }
        }
      ]
    }
  example.lua: |
    local setmetatable = setmetatable

    local _M = require('apicast.policy').new('Example', '0.1')
    local mt = { __index = _M }
    
    function _M.new()
      return setmetatable({}, mt)
    end
    
    function _M:init()
      -- do work when nginx master process starts
    end
    
    function _M:init_worker()
      -- do work when nginx worker process is forked from master
    end
    
    function _M:rewrite()
      -- change the request before it reaches upstream
        ngx.req.set_header('X-CustomPolicy', 'customValue')
    end
    
    function _M:access()
      -- ability to deny the request before it is sent upstream
    end
    
    function _M:content()
      -- can create content instead of connecting to upstream
    end
    
    function _M:post_action()
      -- do something after the response was sent to the client
    end
    
    function _M:header_filter()
      -- can change response headers
    end
    
    function _M:body_filter()
      -- can read and change response body
      -- https://github.com/openresty/lua-nginx-module/blob/master/README.markdown#body_filter_by_lua
    end
    
    function _M:log()
      -- can do extra logging
    end
    
    function _M:balancer()
      -- use for example require('resty.balancer.round_robin').call to do load balancing
    end
    
    return _M
  init.lua: |
    return require('example')
EOF

Create a custom environment secret (with the watched-by label):

cat << EOF | oc create -f -
apiVersion: v1
kind: Secret
metadata:
  name: custom-env-1
  namespace: $NAMESPACE
  labels:
    apimanager.apps.3scale.net/watched-by: apimanager
type: Opaque
stringData:
  custom_env.lua: |
    local cjson = require('cjson')
    local PolicyChain = require('apicast.policy_chain')
    local policy_chain = context.policy_chain
    
    local logging_policy_config = cjson.decode([[
    {
      "enable_access_logs": false,
      "custom_logging": "\"{{request}}\" to service {{service.id}} and {{service.name}}"
    }
    ]])
    
    policy_chain:insert( PolicyChain.load_policy('logging', 'builtin', logging_policy_config), 1)
    
    return {
      policy_chain = policy_chain,
      port = { metrics = 9421 },
    }
EOF

Create an APIManager CR that references the custom-policy-1 and custom-env-1 secrets:

DOMAIN=$(oc get routes console -n openshift-console -o json | jq -r '.status.ingress[0].routerCanonicalHostname' | sed 's/router-default.//')
cat << EOF | oc create -f -
kind: APIManager
apiVersion: apps.3scale.net/v1alpha1
metadata:
  name: 3scale
  namespace: $NAMESPACE
spec:
  wildcardDomain: $DOMAIN
  system:
    fileStorage:
      simpleStorageService:
        configurationSecretRef:
          name: s3-credentials
  apicast:
    productionSpec:
      customPolicies:
        - name: custom-policy1
          version: "0.1"
          secretRef:
            name: custom-policy-1
      customEnvironments:
        - secretRef:
            name: custom-env-1
    stagingSpec:
      customPolicies:
        - name: custom-policy1
          version: "0.1"
          secretRef:
            name: custom-policy-1
      customEnvironments:
        - secretRef:
            name: custom-env-1
EOF

Run the operator:

make run

Wait for the install to complete:

oc get apimanager 3scale -oyaml -w

Verify that the APIManager's labels reference the two secrets' UIDs with the custom-env-1 label value set to true and the custom-policy-1 label value set to false:

oc get apimanager 3scale -oyaml | yq '.metadata.labels'

The labels should look like this:

secret.apimanager.apps.3scale.net/6fd57b4c-2811-43a7-bcb2-8c84a8c70e43: "true"
secret.apimanager.apps.3scale.net/5691d37c-3707-4073-ab92-581302a78a2a: "false"

Verify that the apicast-staging and apicast-productions pods' annotations have a references to the custom-env-1 secret but not the custom-policy-1 secret:

oc get pods -l deployment=apicast-staging -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

oc get pods -l deployment=apicast-production -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

The annotations for both pods should look like this:

apimanager.apps.3scale.net/customenv-secret-resource-version-custom-env-1: "944031"
apimanager.apps.3scale.net/env-configmap-hash: "1046001186"

NOTE: The env-configmap-hash annotation is supposed to be there and is not related to the watched-by secrets.

Verify that the hashed-secret-data secret exists but only has an entry for the custom-env-1 secret:

oc get secret hashed-secret-data -oyaml

Add the watched-by label to the custom-policy-1 secret:

oc label secret custom-policy-1 apimanager.apps.3scale.net/watched-by=apimanager

Verify that the APIManager labels were updated and that both secret labels have a value of true:

oc get apimanager 3scale -oyaml | yq '.metadata.labels'

The labels should now look like this:

secret.apimanager.apps.3scale.net/6fd57b4c-2811-43a7-bcb2-8c84a8c70e43: "true"
secret.apimanager.apps.3scale.net/5691d37c-3707-4073-ab92-581302a78a2a: "true"

Once the new apicast-staging and apicast-production pods are ready, verify that they have annotations with references to both the custom-env-1 secret and the custom-policy-1 secret:

oc get pods -l deployment=apicast-staging -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

oc get pods -l deployment=apicast-production -oyaml | yq '.items[0].metadata.annotations' | grep apimanager.apps.3scale.net

The annotations should look like this:

apimanager.apps.3scale.net/customenv-secret-resource-version-custom-env-1: "944031"
apimanager.apps.3scale.net/custompolicy-secret-resource-version-custom-policy-1: "954907"
apimanager.apps.3scale.net/env-configmap-hash: "1046001186"

NOTE: The env-configmap-hash annotation is supposed to be there and is not related to the watched-by secrets.

Verify that the hashed-secret-data secret has an entry for both the custom-env-1 and custom-policy-1 secrets:

oc get secret hashed-secret-data -oyaml

Edit the data in the either secret and verify that new apicast-staging and apicast-production pods are created:

oc get pods

Once the pods have stabilized, add a label to either secret and verify that no new pods are created even though the resourceVersion has changed:

oc label secret custom-env-1 dummy=label

valerymo · 2024-11-26T14:03:50Z

doc/adding-apicast-custom-environments.md

-
-* [**recommended way**] Create another secret with a different name and update the APIManager custom resource field `customEnvironments[].secretRef.name`. The operator will trigger a rolling update loading the new custom environment content.
-* Update the existing secret content and redeploy apicast turning `spec.apicast.productionSpec.replicas` or `spec.apicast.stagingSpec.replicas` to 0 and then back to the previous value.
+**NOTE**: If the referenced secret does not exist, the operator will mark the APIManager CustomResource as failed. The apicast Deployment object will also fail if the referenced secret does not exist.


Hi @carlkyrillos . I did test as in Description and all looks fine. Code review is not completed, but meanwhile looks very good. Just this behavior - I'm not sure. I tried delete secret, but apimanager CR was not updated, althouh reconciler reporting error.

Pods status - all UP and running ok

Delete secret: oc delete secret custom-env-1

Error in reconciler:

2024-11-26T15:54:30+02:00 ERROR Reconciler error {"controller": "apimanager", "controllerGroup": "apps.3scale.net", "controllerKind": "APIManager", "APIManager": {"name":"3scale","namespace":"3scale-test"}, "namespace": "3scale-test", "name": "3scale", "reconcileID": "151a6cea-64bd-40cd-8f1b-cf8ad6be102c", "error": "spec.apicast.productionSpec.customEnvironments[0]: Invalid value: v1alpha1.CustomEnvironmentSpec{SecretRef:(*v1.LocalObjectReference)(0x14000f8f7e0)}: Secret \"custom-env-1\" not found", "errorCauses": [{"error": "spec.apicast.productionSpec.customEnvironments[0]: Invalid value: v1alpha1.CustomEnvironmentSpec{SecretRef:(*v1.LocalObjectReference)(0x14000f8f7e0)}: Secret \"custom-env-1\" not found"}]}

No error in CR:

~/go/3scale-operator oc describe apimanager 3scale |grep -i error ~/go/3scale-operator oc describe apimanager 3scale |grep -i warn ~/go/3scale-operator oc describe apimanager 3scale |grep Status Status: Status: True Status: True ~/go/3scale-operator oc describe apimanager 3scale |grep Message Message: All requirements for the current version are met

Thank you

@valerymo You are right this is a bug, thank you for catching this. I'll take a look at the code and will push a fix.

@MStokluska After thinking about this some more I'm second guessing my original logic. Are we sure we want to fail the APIManager CR if a secret referenced in the .spec is missing? Would it be better to just fail the deployment that is using the watched secret? That way we wouldn't be holding up the entire installation, just the one component relying on the watched secret. CC: @valerymo

if I understand correctly, if a secret referenced missing in APIM we are setting APIM as not ready?
If so, I see no objection in setting apim via not ready deployment to not ready as well. Especially that APIM doesn't log "lastError". But I have not looked at the PR so I might be missing some context.

@MStokluska @valerymo I pushed a commit that checks to see if any of the watched secrets can't be found. If any watched secrets are missing, then the Available Condition in the APIManager CR's .status will switch to false and I added a message to the condition that lists which secrets couldn't be found.

@carlkyrillos Retest done after commit 6 (latest).

I retested all as described in Validation notes, and it looks fine, but i did few additional tests for commit 6, as below, and there are few questions, can you please look.

Test 1

3scale - is running (as described in validation/description), looks good.

Delete both custom secrets: oc delete secret custom-env-1 custom-policy-1

Check Apimanager CR and deployments :

Message appears in apim CR: Message: The following secret(s) could not be found: custom-env-1, custom-policy-1. This is looks good.

All deployments - UP, apicast pods - are not recreated (?). I'm not sure if it's ok, but wanted to ask you. As even secrets were removed - apicast pods are running without restart, and so have references to non existing secrets.

Test 2

install 3scale from scratch. Secrets custom-env-1 and custom-policy-1 are missing, but defined in apimanager CR for APIcast.

Check installtion:

Both apicast deployments/pods are missing (?)

Preflight appears as successfull (?) although apicast deployments missing

Apimanager has Message thet secrets are missing (this is fine)

Carl, could you please take a look. I'm not entirely sure of some logic/behavior of operator, that we see in Tests 1 & 2. Thank you

@valerymo Thanks for re-verifying the PR. For Test 2, this is expected behavior: the apicast pods won't start if they're relying on a secret that doesn't exist and as long as the APIManager has a message that the secrets are missing we should be fine in that scenario.

As for Test 1 that may or may not be a problem. Since we're failing the APIManager when the secrets are deleted, the user should be aware that there is a problem even if the apicast Deployments are still healthy. @MStokluska do we want to also fail the apicast Deployments in this scenario - i.e. when the apicast deployments were originally created the referenced custom secrets did exist but then the secrets were deleted?

Just thinking out loud..the way I would see this / expect this work is as follows:

When I reference a secret in APIM and the secret DOES NOT EXIST - the APIM should fail but deployment should continue as it is, just without the secret

When I reference a secret in APIM and I remove the secret - Ideally, I'd like a finalizer to block me from doing so - the reason is because my current installation depends on that secret. And yes, although removing the secret won't have an immediate effect on the deployment (as in, it will still work with the secret mounted) - IMO we should not be breaking the self-healing mechanism and prevent breaking it when possible. Since operator watches the secret, it could also set a finalizer on it to ensure the secret isn't removed accidentally.
Alternative approach IMO should be shutting down the deployment if it relies on secret that was deleted. Reason for this IMO is it's better to fail immediately when issue occurs rather than giving false sense of correct configuration...

WDYT?

Responding to your first bullet point: I need to double check because it's been awhile since I tested but I believe the PR is behaving in the way you're describing - so we should be set there.

Responding to your second bullet: I think adding a finalizer to the secret is the cleanest solution but do we want to add the finalizer to any user-created secret that is specifiied in the APIManager .spec or only to secrets that are specifiecd in the .spec and have the watched-by label?

For the second point - IMO, the finalizer should be added to watched-by only.

MStokluska · 2025-01-14T12:37:33Z

pkg/3scale/amp/component/apicast.go

 	}

-	for key, val := range apicast.Options.ProductionAdditionalPodAnnotations {


Did we allow to specify annotations via apim?
If so, are we removing this functionality?

Yes we do allow users to specify annotations that get injected into the Pod but not into the Deployment. The watched secret annotations get injected into the Deployment but not the Pod so there's no risk of them overwriting one another. So we can keep the existing functionality without changes.

MStokluska · 2025-01-14T12:37:54Z

pkg/3scale/amp/component/apicast_options.go

@@ -129,9 +129,6 @@ type ApicastOptions struct {

 	ProductionServiceCacheSize *int32
 	StagingServiceCacheSize    *int32
-
-	StagingAdditionalPodAnnotations    map[string]string `validate:"required"`


same question as above

Yes we do allow users to specify annotations that get injected into the Pod but not into the Deployment. The watched secret annotations get injected into the Deployment but not the Pod so there's no risk of them overwriting one another. So we can keep the existing functionality without changes.

carlkyrillos requested a review from a team as a code owner November 5, 2024 18:42

openshift-ci bot added the do-not-merge/work-in-progress label Nov 5, 2024

carlkyrillos force-pushed the THREESCALE-11395 branch 3 times, most recently from 1b639f9 to fad7348 Compare November 7, 2024 21:25

carlkyrillos changed the title ~~[WIP] THREESCALE-11395 Improved secret watched-by logic~~ THREESCALE-11395 Improved secret watched-by logic Nov 7, 2024

openshift-ci bot removed the do-not-merge/work-in-progress label Nov 7, 2024

carlkyrillos force-pushed the THREESCALE-11395 branch from 786f03d to 1f3285e Compare November 20, 2024 14:31

valerymo reviewed Nov 26, 2024

View reviewed changes

MStokluska reviewed Jan 14, 2025

View reviewed changes

carlkyrillos added 6 commits January 14, 2025 10:15

THREESCALE-11395 Fixed true/false logic on APIManager labels

051fa8b

THREESCALE-11395 Added hashed secret

4a7b904

THREESCALE-11395 Added support to watch apicast-related secrets

94ea6c6

THREESCALE-11395 Updated e2e tests and added more unit tests

1013c9f

THREESCALE-11395 Updated docs with watched secret changes

4f28a70

THREESCALE-11395 Fixed APIManager to fail when watched secret is missing

b9a2d2e

carlkyrillos force-pushed the THREESCALE-11395 branch from 55b3bb3 to b9a2d2e Compare January 14, 2025 15:24

carlkyrillos merged commit ebcdc1b into 3scale:master Jan 14, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THREESCALE-11395 Improved secret watched-by logic #1030

THREESCALE-11395 Improved secret watched-by logic #1030

carlkyrillos commented Nov 5, 2024 •

edited

Loading

valerymo Nov 26, 2024

carlkyrillos Nov 26, 2024

carlkyrillos Nov 26, 2024 •

edited

Loading

MStokluska Nov 27, 2024

carlkyrillos Dec 3, 2024

valerymo Jan 5, 2025 •

edited

Loading

carlkyrillos Jan 6, 2025

MStokluska Jan 8, 2025

carlkyrillos Jan 8, 2025

MStokluska Jan 9, 2025

MStokluska Jan 14, 2025

carlkyrillos Jan 14, 2025

MStokluska Jan 14, 2025

carlkyrillos Jan 14, 2025

		}

		for key, val := range apicast.Options.ProductionAdditionalPodAnnotations {

THREESCALE-11395 Improved secret watched-by logic #1030

THREESCALE-11395 Improved secret watched-by logic #1030

Conversation

carlkyrillos commented Nov 5, 2024 • edited Loading

Issue Link

What

Verification Steps

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlkyrillos Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerymo Jan 5, 2025 • edited Loading

Choose a reason for hiding this comment

Test 1

Test 2

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlkyrillos commented Nov 5, 2024 •

edited

Loading

carlkyrillos Nov 26, 2024 •

edited

Loading

valerymo Jan 5, 2025 •

edited

Loading