Skip to content

Commit 44324ab

Browse files
Fix removeUntaggedEc2: validate billing tags for EKS instances
Previously, EKS clusters without billing tags were deleted if they didn't match the skip pattern, even if they had valid billing tags. This fix adds proper billing tag validation for EKS instances. Changes: - Add has_valid_billing_tag() to validate category strings and timestamps - Check billing tags BEFORE skip pattern for EKS instances - Support Unix timestamp expiration in iit-billing-tag - Add comprehensive test suite with 12 test scenarios - All tests pass with proper timezone handling (UTC)
1 parent e403dd3 commit 44324ab

File tree

3 files changed

+977
-0
lines changed

3 files changed

+977
-0
lines changed
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# removeUntaggedEc2 Lambda - Fixed Logic
2+
3+
## Problem
4+
5+
The original Lambda had a bug in how it handled EKS instances:
6+
7+
**OLD BEHAVIOR:**
8+
- EKS cluster doesn't match `pe-.*` pattern → **Always delete cluster** (ignored `iit-billing-tag`)
9+
10+
**NEW BEHAVIOR:**
11+
- EKS cluster has valid `iit-billing-tag` (category or future timestamp) → **Keep it**
12+
- EKS cluster matches `pe-.*` pattern → **Keep it** (protected cluster)
13+
- EKS cluster has **neither** valid billing tag **nor** matches pattern → **Delete cluster**
14+
15+
## Changes Made
16+
17+
### 1. New Function: `has_valid_billing_tag(tags_dict, instance_launch_time)`
18+
19+
Validates `iit-billing-tag` values:
20+
- **Category strings** (e.g., `"pmm-staging"`, `"jenkins-pmm-slave"`) → Valid
21+
- **Unix timestamps** (e.g., `"1759837138"`) → Valid only if timestamp is in the future
22+
- **Empty or missing** → Invalid
23+
24+
### 2. Updated Function: `is_eks_managed_instance(instance, region)`
25+
26+
New logic flow:
27+
1. Check if instance has EKS tags
28+
2. **First check:** Does instance have valid billing tag? → Skip (keep it)
29+
3. **Second check:** Does cluster name match skip pattern? → Skip (protected)
30+
4. **Neither:** Mark cluster for deletion
31+
32+
### 3. Updated Function: `is_instance_to_terminate(instance)`
33+
34+
Now uses `has_valid_billing_tag()` for consistent validation.
35+
36+
## Test Coverage
37+
38+
Run `test_removeUntaggedEc2_logic.py` to validate logic without AWS:
39+
40+
```bash
41+
python3 cloud/aws-functions/test_removeUntaggedEc2_logic.py
42+
```
43+
44+
### Test Scenarios
45+
46+
**Regular EC2 Instances:**
47+
- ✓ With category tag (`jenkins-pmm-slave`) → Kept
48+
- ✓ With future timestamp → Kept
49+
- ✗ With expired timestamp → Terminated
50+
- ✗ No tag, running > 10min → Terminated
51+
- ○ No tag, running < 10min → Kept (grace period)
52+
53+
**EKS Instances - Protected Cluster (`pe-*`):**
54+
-`pe-crossplane` without tag → Kept (matches skip pattern)
55+
-`pe-infra` with tag → Kept (has billing tag)
56+
57+
**EKS Instances - Non-Protected Cluster:**
58+
-`pmm-test` with category tag → Kept (has valid tag)
59+
-`pmm-temp` with future timestamp → Kept (has valid tag)
60+
-`pmm-ha` without tag → **Cluster deleted**
61+
-`pmm-expired` with expired timestamp → **Cluster deleted**
62+
63+
## Deployment
64+
65+
### Option 1: Direct Lambda Update
66+
67+
```bash
68+
cd cloud/aws-functions
69+
zip removeUntaggedEc2.zip removeUntaggedEc2.py
70+
71+
aws lambda update-function-code \
72+
--function-name removeUntaggedEc2 \
73+
--zip-file fileb://removeUntaggedEc2.zip \
74+
--region eu-west-1 \
75+
--profile percona-dev-admin
76+
```
77+
78+
### Option 2: Update CloudFormation Stack
79+
80+
If the Lambda is managed by CloudFormation, update the inline code in the stack template.
81+
82+
## Environment Variables
83+
84+
- `EKS_SKIP_PATTERN` (default: `"pe-.*"`) - Regex pattern for protected EKS clusters
85+
86+
## How to Tag EKS Clusters
87+
88+
### Via eksctl Config
89+
90+
```yaml
91+
apiVersion: eksctl.io/v1alpha5
92+
kind: ClusterConfig
93+
metadata:
94+
name: pmm-ha
95+
region: us-east-2
96+
tags:
97+
iit-billing-tag: "pmm-eks"
98+
managedNodeGroups:
99+
- name: system
100+
tags:
101+
iit-billing-tag: "pmm-eks"
102+
propagateASGTags: true # Ensures tags go to EC2 instances
103+
```
104+
105+
### Via eksctl CLI
106+
107+
```bash
108+
eksctl create cluster --name pmm-ha \
109+
--tags iit-billing-tag=pmm-eks \
110+
--managed \
111+
--node-labels iit-billing-tag=pmm-eks
112+
```
113+
114+
### For Existing Clusters
115+
116+
Tag the node groups and update launch templates to propagate tags to new instances.
117+
118+
## Answer to Alex's Question
119+
120+
> Why are we skipping EKS instances if they don't have `iit-billing-tag` set?
121+
122+
**We're not anymore!** The fixed logic now:
123+
124+
1. **Checks billing tag first** - If EKS instance has valid `iit-billing-tag`, it's kept regardless of cluster name
125+
2. **Then checks skip pattern** - Only if no billing tag, check if cluster matches `pe-.*`
126+
3. **Deletes if neither** - Cluster gets deleted only if it has no valid billing tag AND doesn't match the pattern
127+
128+
The skip pattern (`pe-.*`) is a whitelist for Platform Engineering's long-running clusters that may not have billing tags but should be protected.
129+
130+
## Files
131+
132+
- `removeUntaggedEc2.py` - Updated Lambda function
133+
- `test_removeUntaggedEc2_logic.py` - Validation script with 12 test scenarios
134+
- `README_removeUntaggedEc2.md` - This documentation

0 commit comments

Comments
 (0)