Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hibernate clusters instead deleting them #613

Closed
6 tasks
tobiscr opened this issue Jan 22, 2025 · 1 comment
Closed
6 tasks

Hibernate clusters instead deleting them #613

tobiscr opened this issue Jan 22, 2025 · 1 comment
Assignees
Labels
area/control-plane Related to all activities around Kyma Control Plane kind/feature Categorizes issue or PR as related to a new feature.

Comments

@tobiscr
Copy link
Contributor

tobiscr commented Jan 22, 2025

Description

Instead of deleting a Kyma runtime directly when RuntimeCRs get deleted, we should suspend the Gardener cluster and set the Runtime-CR to disposed. The final deletion request happens delayed via housekeeping logic (either by a separate job or during the time-base reconciliation of all Runtime CRs).

This feature has to cope with several circumstances:

  • How to behave with failing Hibernation of a cluster :

    • If the hibernation is blocked by some Webhooks, KIM will delete the webhooks and retry the hibernation.
    • Delete the worker pools will be triggered (this step will reduce costs to the minimum)

    If this is not solving the hanging hibernation, KIM forces the deletion of the cluster after a few hours (timeout: 2 hours).

  • A new state in RuntimeCR is required (e.g. disposed)

  • For proper billing, we should start supporting new fields in RuntimeCR which indicate when a cluster was provisioned and its deletion requested (see Transition from KEB API to KIM Runtime CR kyma-metrics-collector#89 )

  • A housekeeping job required to deleted disposed cluster when retention period is reached

AC:

  • KIM is no longer deleting clusters, but instead hibernating them
    • Deletion requests for RuntimeCRs which are not already in disposed status triggering a hibernation of a cluster.
    • The RuntimeCR of a successfully hibernated cluster is getting the status disposed and are marked to be non-billable (see KIM has to mark the RuntimeCR to be billable #547)
    • Deletion of RuntimeCRs which are in disposed mode and have reached a pre-defined retention time (e.g. 3 days) are finally deleted in Gardener
  • Retention timeout for disposed clusters is configurable in KIM
  • It's possible to block the deletion of a disposed cluster (e.g. by setting an annotation)

Reasons

Support manual cluster recovery and address risk of unintended deleted Kyma runtimes (either by customers, SAP employees or by a software bug).

Attachments

@tobiscr tobiscr added area/control-plane Related to all activities around Kyma Control Plane kind/feature Categorizes issue or PR as related to a new feature. labels Jan 22, 2025
@tobiscr tobiscr changed the title Safe deletion of Kyma Clusters in KIM: hibernate clusters instead directly deleting it Safe deletion of Kyma Clusters in KIM: hibernate clusters instead directly deleting them Jan 22, 2025
@tobiscr tobiscr changed the title Safe deletion of Kyma Clusters in KIM: hibernate clusters instead directly deleting them Safe deletion of Kyma Clusters in KIM (hibernate before delete) [KIM/feature] Jan 22, 2025
@tobiscr tobiscr changed the title Safe deletion of Kyma Clusters in KIM (hibernate before delete) [KIM/feature] Hibernate clusters instead deleting them Jan 24, 2025
@tobiscr tobiscr self-assigned this Feb 3, 2025
@tobiscr
Copy link
Contributor Author

tobiscr commented Feb 5, 2025

Closing it as we detected several technical disadvantages when using hibernation for preventing unintended cluster deletion. Focusing now on #642 .

@tobiscr tobiscr closed this as completed Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Related to all activities around Kyma Control Plane kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

1 participant