Skip to content

Commit c27dbd6

Browse files
feat: replace bash cleanup script with native Go implementation
Migrate AWS VPC cleanup functionality from bash script to a native Go package and CLI command, improving maintainability, error handling, and integration with the holodeck toolchain. BREAKING CHANGE: The cleanup functionality is now available through `holodeck cleanup` command instead of scripts/awscleanup.sh Changes: - Add pkg/cleanup package with comprehensive AWS resource deletion - Handles EC2 instances, security groups, subnets, route tables, IGWs - Includes GitHub job status checking via API - Implements retry logic for VPC deletion (3 attempts, 30s delay) - Better error handling with partial failure support - Add `holodeck cleanup` CLI command - Accepts multiple VPC IDs in a single invocation - --region flag for AWS region specification - --force flag to skip GitHub job status checks - Comprehensive help documentation - Update periodic GitHub workflow - Now uses `holodeck cleanup` instead of bash script - Builds Go binary as part of the workflow - Better error handling and logging - Handles multiple VPCs more efficiently - Add documentation - docs/commands/cleanup.md with detailed usage examples - Updated README.md with cleanup examples - Updated command reference documentation - Add deprecation warning to scripts/awscleanup.sh - Script still functional but warns users to migrate - Can be removed in a future release Benefits: - Type safety and better error handling - Consistent with other holodeck commands - Easier to test and maintain - More efficient handling of multiple VPCs - Better integration with CI/CD pipelines Fixes: Improves reliability of periodic cleanup jobs Refs: Migration from bash to Go for better maintainability Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
1 parent 14a7880 commit c27dbd6

File tree

12 files changed

+991
-9
lines changed

12 files changed

+991
-9
lines changed

.github/workflows/periodic.yaml

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
name: Periodic actions
22

3+
# This workflow performs periodic cleanup of AWS resources
4+
# It uses the holodeck cleanup command to remove VPCs tagged with Project=holodeck and Environment=cicd
5+
36
on:
47
schedule:
58
- cron: '0 0,12 * * *' # Runs daily at 12AM and 12PM
@@ -15,6 +18,15 @@ jobs:
1518
- name: Checkout repository
1619
uses: actions/checkout@v4
1720

21+
- name: Set up Go
22+
uses: actions/setup-go@v5
23+
with:
24+
go-version-file: 'go.mod'
25+
cache: true
26+
27+
- name: Build holodeck CLI
28+
run: make build-cli
29+
1830
- name: Set up AWS CLI
1931
uses: aws-actions/configure-aws-credentials@v4
2032
with:
@@ -38,9 +50,18 @@ jobs:
3850
env:
3951
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
4052
run: |
41-
for vpcid in $AWS_VPC_IDS; do
42-
scripts/awscleanup.sh $vpcid
43-
done
53+
# Use the new holodeck cleanup command
54+
# The AWS_REGION is already set by aws-actions/configure-aws-credentials
55+
echo "Cleaning up VPCs in region ${{ matrix.aws-region }}: $AWS_VPC_IDS"
56+
if [ -n "$AWS_VPC_IDS" ]; then
57+
# The cleanup command can handle multiple VPCs at once
58+
./bin/holodeck cleanup $AWS_VPC_IDS || {
59+
echo "::warning::Some VPCs failed to cleanup in region ${{ matrix.aws-region }}"
60+
exit 0 # Don't fail the workflow if some cleanups fail
61+
}
62+
else
63+
echo "No VPCs to clean up in region ${{ matrix.aws-region }}"
64+
fi
4465
4566
- name: Post cleanup
4667
run: |

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,17 @@ See [docs/prerequisites.md](docs/prerequisites.md) for details.
3939

4040
---
4141

42+
## ⚠️ Important: Kernel Compatibility
43+
44+
When installing NVIDIA drivers, Holodeck requires kernel headers matching your running kernel
45+
version. If exact headers are unavailable, Holodeck will attempt to find compatible ones,
46+
though this may cause driver compilation issues.
47+
48+
For kernel compatibility details and troubleshooting, see
49+
[Kernel Compatibility](docs/prerequisites.md#kernel-compatibility) in the prerequisites documentation.
50+
51+
---
52+
4253
## 📝 How to Contribute
4354

4455
See [docs/contributing/](docs/contributing/) for full details.
@@ -78,6 +89,12 @@ holodeck list
7889
holodeck delete <instance-id>
7990
```
8091

92+
### Example: Clean up AWS VPC resources
93+
94+
```bash
95+
holodeck cleanup vpc-12345678
96+
```
97+
8198
### Example: Check status
8299

83100
```bash

cmd/cli/cleanup/cleanup.go

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
/*
2+
* Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
package cleanup
18+
19+
import (
20+
"fmt"
21+
"os"
22+
23+
"github.com/NVIDIA/holodeck/internal/logger"
24+
"github.com/NVIDIA/holodeck/pkg/cleanup"
25+
26+
cli "github.com/urfave/cli/v2"
27+
)
28+
29+
type command struct {
30+
log *logger.FunLogger
31+
region string
32+
forceDelete bool
33+
}
34+
35+
// NewCommand constructs the cleanup command with the specified logger
36+
func NewCommand(log *logger.FunLogger) *cli.Command {
37+
c := &command{
38+
log: log,
39+
}
40+
return c.build()
41+
}
42+
43+
func (m *command) build() *cli.Command {
44+
// Create the 'cleanup' command
45+
cleanup := cli.Command{
46+
Name: "cleanup",
47+
Usage: "Clean up AWS VPC resources",
48+
Description: `Clean up AWS VPC resources by VPC ID.
49+
50+
This command will:
51+
- Check GitHub job status (if GITHUB_TOKEN is set and tags are present)
52+
- Delete all resources in the VPC including:
53+
* EC2 instances
54+
* Security groups
55+
* Subnets
56+
* Route tables
57+
* Internet gateways
58+
* The VPC itself
59+
60+
Examples:
61+
# Clean up a single VPC
62+
holodeck cleanup vpc-12345678
63+
64+
# Clean up multiple VPCs
65+
holodeck cleanup vpc-12345678 vpc-87654321
66+
67+
# Force cleanup without job status check
68+
holodeck cleanup --force vpc-12345678
69+
70+
# Clean up in a specific region
71+
holodeck cleanup --region us-west-2 vpc-12345678`,
72+
Flags: []cli.Flag{
73+
&cli.StringFlag{
74+
Name: "region",
75+
Aliases: []string{"r"},
76+
Usage: "AWS region (overrides AWS_REGION env var)",
77+
Destination: &m.region,
78+
},
79+
&cli.BoolFlag{
80+
Name: "force",
81+
Aliases: []string{"f"},
82+
Usage: "Force cleanup without checking job status",
83+
Destination: &m.forceDelete,
84+
},
85+
},
86+
Action: func(c *cli.Context) error {
87+
if c.NArg() == 0 {
88+
return fmt.Errorf("at least one VPC ID is required")
89+
}
90+
return m.run(c)
91+
},
92+
}
93+
94+
return &cleanup
95+
}
96+
97+
func (m *command) run(c *cli.Context) error {
98+
// Determine the region
99+
region := m.region
100+
if region == "" {
101+
region = os.Getenv("AWS_REGION")
102+
if region == "" {
103+
region = os.Getenv("AWS_DEFAULT_REGION")
104+
if region == "" {
105+
return fmt.Errorf("AWS region must be specified via --region flag or AWS_REGION environment variable")
106+
}
107+
}
108+
}
109+
110+
// Create the cleaner
111+
cleaner, err := cleanup.New(m.log, region)
112+
if err != nil {
113+
return fmt.Errorf("failed to create cleaner: %w", err)
114+
}
115+
116+
// Process each VPC ID
117+
successCount := 0
118+
failCount := 0
119+
120+
for _, vpcID := range c.Args().Slice() {
121+
m.log.Info("Processing VPC: %s", vpcID)
122+
123+
var err error
124+
if m.forceDelete {
125+
// Skip job status check
126+
err = cleaner.DeleteVPCResources(vpcID)
127+
} else {
128+
// Check job status first
129+
err = cleaner.CleanupVPC(vpcID)
130+
}
131+
132+
if err != nil {
133+
m.log.Error(fmt.Errorf("failed to cleanup VPC %s: %v", vpcID, err))
134+
failCount++
135+
} else {
136+
m.log.Info("Successfully cleaned up VPC %s", vpcID)
137+
successCount++
138+
}
139+
}
140+
141+
if failCount > 0 {
142+
return fmt.Errorf("cleanup completed with errors: %d succeeded, %d failed", successCount, failCount)
143+
}
144+
145+
m.log.Info("Cleanup completed successfully: %d VPCs cleaned up", successCount)
146+
return nil
147+
}

cmd/cli/main.go

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package main
1919
import (
2020
"os"
2121

22+
"github.com/NVIDIA/holodeck/cmd/cli/cleanup"
2223
"github.com/NVIDIA/holodeck/cmd/cli/create"
2324
"github.com/NVIDIA/holodeck/cmd/cli/delete"
2425
"github.com/NVIDIA/holodeck/cmd/cli/dryrun"
@@ -34,14 +35,13 @@ const (
3435
ProgramName = "holodeck"
3536
)
3637

37-
var log = logger.NewLogger()
38-
3938
type config struct {
4039
Debug bool
4140
}
4241

4342
func main() {
4443
config := config{}
44+
log := logger.NewLogger()
4545

4646
// Create the top-level CLI
4747
c := cli.NewApp()
@@ -68,6 +68,9 @@ Examples:
6868
# Delete an environment
6969
holodeck delete <instance-id>
7070
71+
# Clean up AWS VPC resources
72+
holodeck cleanup vpc-12345678
73+
7174
# Use a custom cache directory
7275
holodeck --cachepath /path/to/cache create -f env.yaml`
7376
c.Version = "0.2.7"
@@ -86,6 +89,7 @@ Examples:
8689

8790
// Define the subcommands
8891
c.Commands = []*cli.Command{
92+
cleanup.NewCommand(log),
8993
create.NewCommand(log),
9094
delete.NewCommand(log),
9195
dryrun.NewCommand(log),
@@ -129,6 +133,9 @@ EXAMPLES:
129133
# Delete an environment
130134
{{.Name}} delete <instance-id>
131135
136+
# Clean up AWS VPC resources
137+
{{.Name}} cleanup vpc-12345678
138+
132139
# Use a custom cache directory
133140
{{.Name}} --cachepath /path/to/cache create -f env.yaml
134141

docs/commands/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ commands.
66
## Basic Commands
77

88
- [create](create.md) - Create a new environment
9+
- [cleanup](cleanup.md) - Clean up AWS VPC resources
910
- [delete](delete.md) - Delete an existing environment
1011
- [list](list.md) - List all environments
1112
- [status](status.md) - Check the status of an environment

docs/commands/cleanup.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Cleanup Command
2+
3+
The `cleanup` command deletes AWS VPC resources, with optional GitHub job status
4+
checking.
5+
6+
## Usage
7+
8+
```bash
9+
holodeck cleanup [options] VPC_ID [VPC_ID...]
10+
```
11+
12+
## Description
13+
14+
The cleanup command performs comprehensive deletion of AWS VPC resources including:
15+
16+
- EC2 instances
17+
- Security groups (with ENI detachment)
18+
- Subnets
19+
- Route tables
20+
- Internet gateways
21+
- The VPC itself
22+
23+
Before deletion, it can optionally check GitHub Actions job status using VPC tags
24+
to ensure jobs are completed.
25+
26+
## Options
27+
28+
- `--region, -r`: AWS region (overrides AWS_REGION environment variable)
29+
- `--force, -f`: Force cleanup without checking GitHub job status
30+
31+
## Environment Variables
32+
33+
- `AWS_REGION`: Default AWS region if not specified via flag
34+
- `AWS_DEFAULT_REGION`: Fallback region if AWS_REGION is not set
35+
- `GITHUB_TOKEN`: GitHub token for checking job status (optional)
36+
37+
## Examples
38+
39+
### Clean up a single VPC
40+
41+
```bash
42+
holodeck cleanup vpc-12345678
43+
```
44+
45+
### Clean up multiple VPCs
46+
47+
```bash
48+
holodeck cleanup vpc-12345678 vpc-87654321
49+
```
50+
51+
### Force cleanup without job status check
52+
53+
```bash
54+
holodeck cleanup --force vpc-12345678
55+
```
56+
57+
### Clean up in a specific region
58+
59+
```bash
60+
holodeck cleanup --region us-west-2 vpc-12345678
61+
```
62+
63+
## GitHub Job Status Checking
64+
65+
If the VPC has the following tags and `GITHUB_TOKEN` is set:
66+
67+
- `GitHubRepository`: The repository in format `owner/repo`
68+
- `GitHubRunId`: The GitHub Actions run ID
69+
70+
The command will check if all jobs in that run are completed before proceeding with
71+
deletion. Use `--force` to skip this check.
72+
73+
## Notes
74+
75+
- The command handles dependencies between resources automatically
76+
- Security groups attached to ENIs are detached before deletion
77+
- Non-main route tables are handled appropriately
78+
- VPC deletion includes retry logic (3 attempts with 30-second delays)
79+
- Partial failures are logged but don't stop the cleanup process

0 commit comments

Comments
 (0)