|
| 1 | +# removeUntaggedEc2 Lambda - Fixed Logic |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +The original Lambda had a bug in how it handled EKS instances: |
| 6 | + |
| 7 | +**OLD BEHAVIOR:** |
| 8 | +- EKS cluster doesn't match `pe-.*` pattern → **Always delete cluster** (ignored `iit-billing-tag`) |
| 9 | + |
| 10 | +**NEW BEHAVIOR:** |
| 11 | +- EKS cluster has valid `iit-billing-tag` (category or future timestamp) → **Keep it** |
| 12 | +- EKS cluster matches `pe-.*` pattern → **Keep it** (protected cluster) |
| 13 | +- EKS cluster has **neither** valid billing tag **nor** matches pattern → **Delete cluster** |
| 14 | + |
| 15 | +## Changes Made |
| 16 | + |
| 17 | +### 1. New Function: `has_valid_billing_tag(tags_dict, instance_launch_time)` |
| 18 | + |
| 19 | +Validates `iit-billing-tag` values: |
| 20 | +- **Category strings** (e.g., `"pmm-staging"`, `"jenkins-pmm-slave"`) → Valid |
| 21 | +- **Unix timestamps** (e.g., `"1759837138"`) → Valid only if timestamp is in the future |
| 22 | +- **Empty or missing** → Invalid |
| 23 | + |
| 24 | +### 2. Updated Function: `is_eks_managed_instance(instance, region)` |
| 25 | + |
| 26 | +New logic flow: |
| 27 | +1. Check if instance has EKS tags |
| 28 | +2. **First check:** Does instance have valid billing tag? → Skip (keep it) |
| 29 | +3. **Second check:** Does cluster name match skip pattern? → Skip (protected) |
| 30 | +4. **Neither:** Mark cluster for deletion |
| 31 | + |
| 32 | +### 3. Updated Function: `is_instance_to_terminate(instance)` |
| 33 | + |
| 34 | +Now uses `has_valid_billing_tag()` for consistent validation. |
| 35 | + |
| 36 | +## Test Coverage |
| 37 | + |
| 38 | +Run `test_removeUntaggedEc2_logic.py` to validate logic without AWS: |
| 39 | + |
| 40 | +```bash |
| 41 | +python3 cloud/aws-functions/test_removeUntaggedEc2_logic.py |
| 42 | +``` |
| 43 | + |
| 44 | +### Test Scenarios |
| 45 | + |
| 46 | +**Regular EC2 Instances:** |
| 47 | +- ✓ With category tag (`jenkins-pmm-slave`) → Kept |
| 48 | +- ✓ With future timestamp → Kept |
| 49 | +- ✗ With expired timestamp → Terminated |
| 50 | +- ✗ No tag, running > 10min → Terminated |
| 51 | +- ○ No tag, running < 10min → Kept (grace period) |
| 52 | + |
| 53 | +**EKS Instances - Protected Cluster (`pe-*`):** |
| 54 | +- ✓ `pe-crossplane` without tag → Kept (matches skip pattern) |
| 55 | +- ✓ `pe-infra` with tag → Kept (has billing tag) |
| 56 | + |
| 57 | +**EKS Instances - Non-Protected Cluster:** |
| 58 | +- ✓ `pmm-test` with category tag → Kept (has valid tag) |
| 59 | +- ✓ `pmm-temp` with future timestamp → Kept (has valid tag) |
| 60 | +- ✗ `pmm-ha` without tag → **Cluster deleted** |
| 61 | +- ✗ `pmm-expired` with expired timestamp → **Cluster deleted** |
| 62 | + |
| 63 | +## Deployment |
| 64 | + |
| 65 | +### Option 1: Direct Lambda Update |
| 66 | + |
| 67 | +```bash |
| 68 | +cd cloud/aws-functions |
| 69 | +zip removeUntaggedEc2.zip removeUntaggedEc2.py |
| 70 | + |
| 71 | +aws lambda update-function-code \ |
| 72 | + --function-name removeUntaggedEc2 \ |
| 73 | + --zip-file fileb://removeUntaggedEc2.zip \ |
| 74 | + --region eu-west-1 \ |
| 75 | + --profile percona-dev-admin |
| 76 | +``` |
| 77 | + |
| 78 | +### Option 2: Update CloudFormation Stack |
| 79 | + |
| 80 | +If the Lambda is managed by CloudFormation, update the inline code in the stack template. |
| 81 | + |
| 82 | +## Environment Variables |
| 83 | + |
| 84 | +- `EKS_SKIP_PATTERN` (default: `"pe-.*"`) - Regex pattern for protected EKS clusters |
| 85 | + |
| 86 | +## How to Tag EKS Clusters |
| 87 | + |
| 88 | +### Via eksctl Config |
| 89 | + |
| 90 | +```yaml |
| 91 | +apiVersion: eksctl.io/v1alpha5 |
| 92 | +kind: ClusterConfig |
| 93 | +metadata: |
| 94 | + name: pmm-ha |
| 95 | + region: us-east-2 |
| 96 | + tags: |
| 97 | + iit-billing-tag: "pmm-eks" |
| 98 | +managedNodeGroups: |
| 99 | + - name: system |
| 100 | + tags: |
| 101 | + iit-billing-tag: "pmm-eks" |
| 102 | + propagateASGTags: true # Ensures tags go to EC2 instances |
| 103 | +``` |
| 104 | +
|
| 105 | +### Via eksctl CLI |
| 106 | +
|
| 107 | +```bash |
| 108 | +eksctl create cluster --name pmm-ha \ |
| 109 | + --tags iit-billing-tag=pmm-eks \ |
| 110 | + --managed \ |
| 111 | + --node-labels iit-billing-tag=pmm-eks |
| 112 | +``` |
| 113 | + |
| 114 | +### For Existing Clusters |
| 115 | + |
| 116 | +Tag the node groups and update launch templates to propagate tags to new instances. |
| 117 | + |
| 118 | +## Answer to Alex's Question |
| 119 | + |
| 120 | +> Why are we skipping EKS instances if they don't have `iit-billing-tag` set? |
| 121 | +
|
| 122 | +**We're not anymore!** The fixed logic now: |
| 123 | + |
| 124 | +1. **Checks billing tag first** - If EKS instance has valid `iit-billing-tag`, it's kept regardless of cluster name |
| 125 | +2. **Then checks skip pattern** - Only if no billing tag, check if cluster matches `pe-.*` |
| 126 | +3. **Deletes if neither** - Cluster gets deleted only if it has no valid billing tag AND doesn't match the pattern |
| 127 | + |
| 128 | +The skip pattern (`pe-.*`) is a whitelist for Platform Engineering's long-running clusters that may not have billing tags but should be protected. |
| 129 | + |
| 130 | +## Files |
| 131 | + |
| 132 | +- `removeUntaggedEc2.py` - Updated Lambda function |
| 133 | +- `test_removeUntaggedEc2_logic.py` - Validation script with 12 test scenarios |
| 134 | +- `README_removeUntaggedEc2.md` - This documentation |
0 commit comments