Skip to content

Commit fe11c08

Browse files
authored
Merge pull request #7 from SQxiaoxiaomeng/chaos-palybook
add namespace delete scenario
2 parents 2cbcbb9 + 79fa202 commit fe11c08

File tree

5 files changed

+183
-6
lines changed

5 files changed

+183
-6
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Using `kube-apiserver overload` as an example:
6262

6363
- Create kube-apiserver overload workflow:
6464
```bash
65-
kubectl create -f playbook/rabc.yaml && kubectl create -f playbook/all-in-one-template.yaml && kubectl create -f playbook/workflow/apiserver-overload-scenario.yaml
65+
kubectl create -f playbook/rbac.yaml && kubectl create -f playbook/all-in-one-template.yaml && kubectl create -f playbook/workflow/apiserver-overload-scenario.yaml
6666
```
6767

6868
![apiserver overload flowchart](./playbook/docs/chaos-flowchart-en.png)
@@ -91,7 +91,7 @@ kubectl delete workflow {workflow-name}
9191
| etcd overload (ReadCache/Consistent cache) | - | Completed | - | Add Etcd Overload Protect Policy, Simulate etcd high load |
9292
| coredns outage | - | Completed | - | Simulate coredns service outage |
9393
| kubernetes-proxy outage | - | Completed | - | Simulate kubernetes-proxy outage |
94-
| accidental deletion scenario | P0 | In Progress | 2025-05-30 | Simulate accidental resource deletion |
94+
| accidental deletion scenario | - | Completed | - | Simulate accidental resource deletion |
9595
| kube-apiserver outage | P0 | In Progress | 2025-06-15 | Simulate kube-apiserver outage |
9696
| etcd outage | P0 | In Progress | 2025-06-15 | Simulate etcd cluster failure |
9797
| kube-scheduler outage | P0 | In Progress | 2025-06-15 | Test scheduling behavior during scheduler failure |

README_zh.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,15 @@ kubectl exec -it -n tke-chaos-test deployment/tke-chaos-argo-workflows-server --
6262

6363
- 创建`kube-apiserver`高负载故障演练`workflow`
6464
```bash
65-
kubectl create -f playbook/rabc.yaml && kubectl create -f playbook/all-in-one-template.yaml && kubectl create -f playbook/workflow/apiserver-overload-scenario.yaml
65+
kubectl create -f playbook/rbac.yaml && kubectl create -f playbook/all-in-one-template.yaml && kubectl create -f playbook/workflow/apiserver-overload-scenario.yaml
6666
```
6767

6868
![apiserver高负载演练流程图](./playbook/docs/chaos-flowchart-zh.png)
6969

7070

7171
**核心流程说明**
7272

73-
- **演练配置**:在开始执行演练前,您可能需要配置一些演练参数,如配置`webhook-url`参数配置企微群通知,参数均提供了默认值,您可以在不修改任何参数的情况下执行演练。各演练场景参数说明见[演练场景参数配置说明](playbook/README.md)
73+
- **演练配置**:在开始执行演练前,您可能需要配置一些演练参数,如配置`webhook-url`参数配置企微群通知,参数均提供了默认值,您可以在不修改任何参数的情况下执行演练。各演练场景参数说明见[演练场景参数配置说明](playbook/README_zh.md)
7474
- **演练前校验**:开始执行演练之前,会对`目标集群`做健康检查校验,检查演练集群中的`Node``Pod`的健康比例,低于阈值将不允许演练,您可以通过修改,`precheck-pods-health-ratio``precheck-nodes-health-ratio`参数调整阈值。同时会校验`目标集群`中是否存在`tke-chaos-test/tke-chaos-precheck-resource ConfigMap`,如不存在将不允许演练。
7575
- **执行演练**`kube-apiserver`高负载演练执行过程中,会对`目标集群``kube-apiserver`发起大量的洪泛`List Pod`请求,以模拟`kube-apiserver`高负载场景,您可以访问`腾讯云TKE控制台``目标集群`核心组件监控,查看`kube-apiserver`的负载情况。同时,您应该关注演练过程中您的业务Pod的健康状态,以验证`kube-apiserver`高负载是否会影响您的业务。
7676
- **演练结果**:您可以访问`Argo Server UI`查看演练结果(推荐),您也可以执行`kubectl describe workflow {workflow-name}`查看演练结果。
@@ -92,15 +92,14 @@ kubectl delete worflow {workflow-name}
9292
| etcd高负载演练(增加etcd过载保护策略) | - | 完成 | - | 增加etcd过载保护策略,并模拟etcd服务高负载 |
9393
| coredns停服 | - | 完成 | - | 模拟coredns服务中断场景 |
9494
| kubernetes-proxy停服 | - | 完成 | - | 模拟kubernetes-proxy服务中断场景 |
95-
| 资源误删除场景 | P0 | 开发中 | 2025-05-30 | 模拟资源被误删除场景 |
95+
| 资源误删除场景 | - | 完成 | - | 模拟资源被误删除场景 |
9696
| kube-apiserver停服演练 | P0 | 开发中 | 2025-06-15 | 模拟kube-apiserver服务中断场景 |
9797
| etcd停服演练 | P0 | 开发中 | 2025-06-15 | 模拟etcd集群故障场景 |
9898
| kube-scheduler停服演练 | P0 | 开发中 | 2025-06-15 | 测试调度器故障期间的集群调度行为 |
9999
| kube-controller-manager停服演练 | P0 | 开发中 | 2025-06-15 | 验证控制器组件故障场景 |
100100
| cloud-controller-manager停服演练 | P0 | 开发中 | 2025-06-15 | 验证控制器组件故障场景 |
101101
| master节点停机 | P1 | 开发中 | 2025-06-15 | 模拟master关机场景 |
102102

103-
104103
## 常见问题
105104
1. 为什么要用两个集群来执行演练测试?
106105

playbook/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,23 @@ This scenario simulates kubernetes-proxy service disruption by:
8383
| `disruption-duration` | `string` | `30s` | Disruption duration (e.g. 30s, 5m) |
8484
| `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Target cluster kubeconfig secret name |
8585

86+
## Namespace Deletion Protection
87+
88+
**playbook**: `workflow/namespace-delete-scenario.yaml`
89+
90+
This scenario tests Tencent Cloud TKE's namespace deletion block policy with the following workflow:
91+
- **Create namespace deletion block policy**: Create a namespace deletion constraint policy to prevent deletion of namespaces containing Pods
92+
- **Create test resources**: Creates test namespace `tke-chaos-ns-76498` and Pod
93+
- **Verify protection**: Attempts to delete namespace with Pod to verify protection works
94+
- **Cleanup**: Deletes Pod first, then namespace, and finally removes namespace deletion block policy
95+
96+
**Parameters**
97+
| Parameter | Type | Default | Description |
98+
|-----------|------|---------|-------------|
99+
| `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Target cluster kubeconfig secret name |
100+
101+
Tencent Cloud TKE supports various resource protection policies, such as CRD deletion protection, PV deletion protection, etc. You can refer to the official Tencent Cloud documentation for more details: [Policy Management](https://cloud.tencent.com/document/product/457/103179)
102+
86103
## TKE Self-maintenance of Master cluster's kube-apiserver Disruption
87104
TODO
88105

playbook/README_zh.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,23 @@
8383
| `disruption-duration` | `string` | `30s` | 服务中断持续时间(如30s,5m等) |
8484
| `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | `目标集群kubeconfig secret`名称,如为空,则演练当前集群 |
8585

86+
## 命名空间删除防护
87+
88+
**playbook**`workflow/namespace-delete-scenario.yaml`
89+
90+
该场景测试腾讯云TKE集群的命名空间删除防护策略功能,主要流程包括:
91+
- **创建保护策略**:创建命名空间删除约束策略,防止删除包含 Pod 的命名空间
92+
- **创建测试资源**:创建测试命名空间 `tke-chaos-ns-76498` 和 Pod
93+
- **验证保护机制**:尝试删除包含 Pod 的命名空间,验证保护策略是否生效
94+
- **清理测试资源**:先删除 Pod 后再删除命名空间,最后移除保护策略
95+
96+
**参数说明**
97+
| 参数名称 | 类型 | 默认值 | 说明 |
98+
|---------|------|--------|------|
99+
| `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | 目标集群 kubeconfig secret 名称,如为空,则演练当前集群 |
100+
101+
腾讯云TKE支持大量的资源防护策略,如`CRD`删除保护、`PV`删除保护等,您可以访问腾讯云官方文档以查看详细信息[策略管理](https://cloud.tencent.com/document/product/457/103179)
102+
86103
## TKE Master自维护集群kube-apiserver停服
87104
TODO
88105

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
apiVersion: argoproj.io/v1alpha1
3+
kind: Workflow
4+
metadata:
5+
name: block-namespace-deletion-scenario
6+
namespace: tke-chaos-test
7+
spec:
8+
entrypoint: main
9+
serviceAccountName: tke-chaos
10+
arguments:
11+
parameters:
12+
- name: kubeconfig-secret-name
13+
value: "dest-cluster-kubeconfig"
14+
- name: block-namespace-deletion-manifest
15+
value: |
16+
apiVersion: constraints.gatekeeper.sh/v1beta1
17+
kind: BlockNamespaceDeletion
18+
metadata:
19+
name: tke-chaos-test
20+
spec:
21+
enforcementAction: deny
22+
match:
23+
kinds:
24+
- apiGroups:
25+
- '*'
26+
kinds:
27+
- Namespace
28+
- name: pod-manifest
29+
value: |
30+
apiVersion: v1
31+
kind: Pod
32+
metadata:
33+
name: pod-76498
34+
namespace: tke-chaos-ns-76498
35+
labels:
36+
pod-name: pod-76498
37+
spec:
38+
nodeSelector:
39+
kubernetes.io/hostname: non-existent-node-3060282720
40+
tolerations:
41+
- key: node.kubernetes.io/unreachable
42+
operator: Exists
43+
effect: NoSchedule
44+
containers:
45+
- name: busybox
46+
image: busybox:1.37.0
47+
command: ["sleep", "3600"]
48+
restartPolicy: Never
49+
templates:
50+
- name: main
51+
steps:
52+
- - name: create-block-namespace-deletion
53+
arguments:
54+
parameters:
55+
- name: action
56+
value: "create"
57+
- name: manifest
58+
value: "{{workflow.parameters.block-namespace-deletion-manifest}}"
59+
- name: kubeconfig-secret-name
60+
value: "{{workflow.parameters.kubeconfig-secret-name}}"
61+
templateRef:
62+
name: kubectl-cmd
63+
template: kubectl-mutating-cmd
64+
clusterScope: true
65+
- - name: create-namespace
66+
arguments:
67+
parameters:
68+
- name: cmd
69+
value: "create namespace tke-chaos-ns-76498"
70+
- name: kubeconfig-secret-name
71+
value: "{{workflow.parameters.kubeconfig-secret-name}}"
72+
templateRef:
73+
name: kubectl-cmd
74+
template: kubectl-cmd
75+
clusterScope: true
76+
- - name: create-pod-in-namespace
77+
arguments:
78+
parameters:
79+
- name: action
80+
value: "create"
81+
- name: manifest
82+
value: "{{workflow.parameters.pod-manifest}}"
83+
- name: kubeconfig-secret-name
84+
value: "{{workflow.parameters.kubeconfig-secret-name}}"
85+
templateRef:
86+
name: kubectl-cmd
87+
template: kubectl-mutating-cmd
88+
clusterScope: true
89+
- - name: delete-namespace-with-pod-in-namespace
90+
continueOn:
91+
failed: true
92+
error: true
93+
arguments:
94+
parameters:
95+
- name: cmd
96+
value: "delete namespace tke-chaos-ns-76498"
97+
- name: kubeconfig-secret-name
98+
value: "{{workflow.parameters.kubeconfig-secret-name}}"
99+
templateRef:
100+
name: kubectl-cmd
101+
template: kubectl-cmd
102+
clusterScope: true
103+
# - - name: suspend
104+
# template: suspend
105+
- - name: delete-pod-in-namespace
106+
arguments:
107+
parameters:
108+
- name: action
109+
value: "delete"
110+
- name: manifest
111+
value: "{{workflow.parameters.pod-manifest}}"
112+
- name: kubeconfig-secret-name
113+
value: "{{workflow.parameters.kubeconfig-secret-name}}"
114+
templateRef:
115+
name: kubectl-cmd
116+
template: kubectl-mutating-cmd
117+
clusterScope: true
118+
- - name: delete-namespace-without-pod-in-namespace
119+
arguments:
120+
parameters:
121+
- name: cmd
122+
value: "delete namespace tke-chaos-ns-76498"
123+
- name: kubeconfig-secret-name
124+
value: "{{workflow.parameters.kubeconfig-secret-name}}"
125+
templateRef:
126+
name: kubectl-cmd
127+
template: kubectl-cmd
128+
clusterScope: true
129+
- - name: delete-block-namespace-deletion
130+
arguments:
131+
parameters:
132+
- name: action
133+
value: "delete"
134+
- name: manifest
135+
value: "{{workflow.parameters.block-namespace-deletion-manifest}}"
136+
- name: kubeconfig-secret-name
137+
value: "{{workflow.parameters.kubeconfig-secret-name}}"
138+
templateRef:
139+
name: kubectl-cmd
140+
template: kubectl-mutating-cmd
141+
clusterScope: true
142+
143+
- name: suspend
144+
suspend: {}

0 commit comments

Comments
 (0)