You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-12Lines changed: 15 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,31 +22,34 @@ This project provides Kubernetes chaos testing capabilities covering scenarios l
22
22
23
23
**dest cluster**
24
24
25
-
2. Create `default/tke-chaos-precheck-resource ConfigMap` in `dest cluster` as a marker for testing eligibility, and create `tke-chaos-test-ns namespace`:
25
+
2. Create `tke-chaos-test/tke-chaos-precheck-resource ConfigMap` in `dest cluster` as a marker for testing eligibility:
3. Obtain `dest cluster`'s internal kubeconfig from Tencent Cloud TKE Console, save to `dest-cluster-kubeconfig` file, then create secret in `src cluster`:
4.Deploy Argo Workflow and Workflow templates in `src cluster` (skip if Argo is already deployed, [**Argo Documentation**](https://argo-workflows.readthedocs.io/en/latest/)):
37
+
4.Clone this project and then deploy Argo Workflow in `src cluster` (skip if Argo is already deployed, [**Argo Documentation**](https://argo-workflows.readthedocs.io/en/latest/)):
5. Enable public access for `tke-chaos-argo/tke-chaos-argo-workflows-server Service` in Tencent Cloud TKE Console. Access Argo Server UI at `LoadBalancer IP:2746` using credentials obtained via:
49
+
5. Enable public access for `tke-chaos-test/tke-chaos-argo-workflows-server Service` in Tencent Cloud TKE Console. Access Argo Server UI at `LoadBalancer IP:2746` using credentials obtained via:
-**Testing Configuration**: Before execution, you may need to configure parameters like `webhook-url` for notifications. Default values are provided so testings can run without modification. See [Scenario Parameters](playbook/README.md) for details.
70
-
-**Precheck**: Before execution, `dest cluster` health is validated by checking Node and Pod health ratios. Testings are blocked if below thresholds (adjustable via `precheck-pods-health-ratio` and `precheck-nodes-health-ratio`). Also verifies existence of `default/tke-chaos-precheck-resource ConfigMap`.
73
+
-**Precheck**: Before execution, `dest cluster` health is validated by checking Node and Pod health ratios. Testings are blocked if below thresholds (adjustable via `precheck-pods-health-ratio` and `precheck-nodes-health-ratio`). Also verifies existence of `tke-chaos-test/tke-chaos-precheck-resource ConfigMap`.
71
74
-**Execute Testing**: During kube-apiserver overload testing, the system floods `dest cluster`'s kube-apiserver with List Pod requests to simulate high load. Monitor kube-apiserver metrics via Tencent Cloud TKE Console and observe your business Pod health during testing.
72
75
-**Result Processing**: View testing results in Argo Server UI (recommended) or via `kubectl describe workflow {workflow-name}`.
Monitor testing progress via Argo Server UI or `kubectl get workflow`. By default, testings run in the default namespace. You can also watch fault simulation Pods via `kubectl get po -w` - Error-state Pods typically indicate testing failures that can be investigated via Pod logs.
109
+
Monitor testing progress via Argo Server UI or `kubectl get -n tke-chaos-test workflow`. By default, testings run in the `tke-chaos-test` namespace. You can also watch fault simulation Pods via `kubectl get -n tke-chaos-test po -w` - Error-state Pods typically indicate testing failures that can be investigated via Pod logs.
107
110
108
111
3. What are common failure reasons?
109
112
110
-
Typical issues include: insufficient RBAC permissions for fault simulation Pods, missing `default/tke-chaos-precheck-resource ConfigMap` in target cluster, missing `tke-chaos-test-ns namespace`, or Argo workflow controller anomalies. Check fault simulation Pod or Argo Workflow Controller logs for details.
113
+
Typical issues include: insufficient RBAC permissions for fault simulation Pods, missing `tke-chaos-test/tke-chaos-precheck-resource ConfigMap` in target cluster, missing `tke-chaos-test namespace`, or Argo workflow controller anomalies. Check fault simulation Pod or Argo Workflow Controller logs for details.
111
114
112
115
4. How to troubleshoot Argo Workflow Controller issues?
113
116
114
117
When workflows show no status after creation via `kubectl get workflow`, the Argo workflow-controller is likely malfunctioning. Check controller logs via:
您可以访问`Argo server UI`查看演练流程,您还可以执行`kubectl get workflow`查看工作流的执行状态。演练默认在default命名空间下执行,您还可以通过执行`kubectl get po -w`命令查看执行演练的`Pod`的执行情况,当出现`Error`状态的`Pod`时,大概率演练失败,您可以查看对应`Pod`日志进行排查。
111
+
您可以访问`Argo server UI`查看演练流程,您还可以执行`kubectl get -n tke-chaos-test workflow`查看工作流的执行状态。演练在`tke-chaos-test`命名空间下执行,您还可以通过执行`kubectl get -n tke-chaos-test po -w`命令查看执行演练的`Pod`的执行情况,当出现`Error`状态的`Pod`时,大概率演练失败,您可以查看对应`Pod`日志进行排查。
This scenario simulates high load on `kube-apiserver` with the following workflow:
10
-
-**Pre-check**: Performs health checks on the target cluster, verifying the health ratio of Nodes and Pods. If below threshold, the test will be aborted. You can adjust thresholds via `precheck-pods-health-ratio` and `precheck-nodes-health-ratio` parameters. Also checks for existence of `default/tke-chaos-precheck-resource ConfigMap`.
10
+
-**Pre-check**: Performs health checks on the target cluster, verifying the health ratio of Nodes and Pods. If below threshold, the test will be aborted. You can adjust thresholds via `precheck-pods-health-ratio` and `precheck-nodes-health-ratio` parameters. Also checks for existence of `tke-chaos-test/tke-chaos-precheck-resource ConfigMap`.
11
11
-**Resource Warm-up**: Creates resources (`pods/configmaps`) to simulate production environment scale.
12
12
-**Fault Injection**: Floods apiserver with `list pod/configmaps` requests to simulate high load.
13
13
-**Cleanup**: Cleans up resources created during the test.
0 commit comments