Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions .github/workflows/periodic.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
name: Periodic actions

# This workflow performs periodic cleanup of AWS resources
# It uses the holodeck cleanup command to remove VPCs tagged with Project=holodeck and Environment=cicd

on:
schedule:
- cron: '0 0,12 * * *' # Runs daily at 12AM and 12PM
Expand All @@ -15,6 +18,15 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: 'go.mod'
cache: true

- name: Build holodeck CLI
run: make build-cli
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we download a stable release instead of building it .


- name: Set up AWS CLI
uses: aws-actions/configure-aws-credentials@v4
with:
Expand All @@ -38,9 +50,18 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
for vpcid in $AWS_VPC_IDS; do
scripts/awscleanup.sh $vpcid
done
# Use the new holodeck cleanup command
# The AWS_REGION is already set by aws-actions/configure-aws-credentials
echo "Cleaning up VPCs in region ${{ matrix.aws-region }}: $AWS_VPC_IDS"
if [ -n "$AWS_VPC_IDS" ]; then
# The cleanup command can handle multiple VPCs at once
./bin/holodeck cleanup $AWS_VPC_IDS || {
echo "::warning::Some VPCs failed to cleanup in region ${{ matrix.aws-region }}"
exit 0 # Don't fail the workflow if some cleanups fail
}
else
echo "No VPCs to clean up in region ${{ matrix.aws-region }}"
fi

- name: Post cleanup
run: |
Expand Down
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,17 @@ See [docs/prerequisites.md](docs/prerequisites.md) for details.

---

## ⚠️ Important: Kernel Compatibility

When installing NVIDIA drivers, Holodeck requires kernel headers matching your running kernel
version. If exact headers are unavailable, Holodeck will attempt to find compatible ones,
though this may cause driver compilation issues.

For kernel compatibility details and troubleshooting, see
[Kernel Compatibility](docs/prerequisites.md#kernel-compatibility) in the prerequisites documentation.

---

## 📝 How to Contribute

See [docs/contributing/](docs/contributing/) for full details.
Expand Down Expand Up @@ -78,6 +89,12 @@ holodeck list
holodeck delete <instance-id>
```

### Example: Clean up AWS VPC resources

```bash
holodeck cleanup vpc-12345678
```

### Example: Check status

```bash
Expand Down
147 changes: 147 additions & 0 deletions cmd/cli/cleanup/cleanup.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package cleanup

import (
"fmt"
"os"

"github.com/NVIDIA/holodeck/internal/logger"
"github.com/NVIDIA/holodeck/pkg/cleanup"

cli "github.com/urfave/cli/v2"
)

type command struct {
log *logger.FunLogger
region string
forceDelete bool
}

// NewCommand constructs the cleanup command with the specified logger
func NewCommand(log *logger.FunLogger) *cli.Command {
c := &command{
log: log,
}
return c.build()
}

func (m *command) build() *cli.Command {
// Create the 'cleanup' command
cleanup := cli.Command{
Name: "cleanup",
Usage: "Clean up AWS VPC resources",
Description: `Clean up AWS VPC resources by VPC ID.

This command will:
- Check GitHub job status (if GITHUB_TOKEN is set and tags are present)
- Delete all resources in the VPC including:
* EC2 instances
* Security groups
* Subnets
* Route tables
* Internet gateways
* The VPC itself

Examples:
# Clean up a single VPC
holodeck cleanup vpc-12345678

# Clean up multiple VPCs
holodeck cleanup vpc-12345678 vpc-87654321

# Force cleanup without job status check
holodeck cleanup --force vpc-12345678

# Clean up in a specific region
holodeck cleanup --region us-west-2 vpc-12345678`,
Flags: []cli.Flag{
&cli.StringFlag{
Name: "region",
Aliases: []string{"r"},
Usage: "AWS region (overrides AWS_REGION env var)",
Destination: &m.region,
},
&cli.BoolFlag{
Name: "force",
Aliases: []string{"f"},
Usage: "Force cleanup without checking job status",
Destination: &m.forceDelete,
},
},
Action: func(c *cli.Context) error {
if c.NArg() == 0 {
return fmt.Errorf("at least one VPC ID is required")
}
return m.run(c)
},
}

return &cleanup
}

func (m *command) run(c *cli.Context) error {
// Determine the region
region := m.region
if region == "" {
region = os.Getenv("AWS_REGION")
if region == "" {
region = os.Getenv("AWS_DEFAULT_REGION")
if region == "" {
return fmt.Errorf("AWS region must be specified via --region flag or AWS_REGION environment variable")
}
}
}

// Create the cleaner
cleaner, err := cleanup.New(m.log, region)
if err != nil {
return fmt.Errorf("failed to create cleaner: %w", err)
}

// Process each VPC ID
successCount := 0
failCount := 0

for _, vpcID := range c.Args().Slice() {
m.log.Info("Processing VPC: %s", vpcID)

var err error
if m.forceDelete {
// Skip job status check
err = cleaner.DeleteVPCResources(vpcID)
} else {
// Check job status first
err = cleaner.CleanupVPC(vpcID)
}

if err != nil {
m.log.Error(fmt.Errorf("failed to cleanup VPC %s: %v", vpcID, err))
failCount++
} else {
m.log.Info("Successfully cleaned up VPC %s", vpcID)
successCount++
}
}

if failCount > 0 {
return fmt.Errorf("cleanup completed with errors: %d succeeded, %d failed", successCount, failCount)
}

m.log.Info("Cleanup completed successfully: %d VPCs cleaned up", successCount)
return nil
}
11 changes: 9 additions & 2 deletions cmd/cli/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package main
import (
"os"

"github.com/NVIDIA/holodeck/cmd/cli/cleanup"
"github.com/NVIDIA/holodeck/cmd/cli/create"
"github.com/NVIDIA/holodeck/cmd/cli/delete"
"github.com/NVIDIA/holodeck/cmd/cli/dryrun"
Expand All @@ -34,14 +35,13 @@ const (
ProgramName = "holodeck"
)

var log = logger.NewLogger()

type config struct {
Debug bool
}

func main() {
config := config{}
log := logger.NewLogger()

// Create the top-level CLI
c := cli.NewApp()
Expand All @@ -68,6 +68,9 @@ Examples:
# Delete an environment
holodeck delete <instance-id>

# Clean up AWS VPC resources
holodeck cleanup vpc-12345678

# Use a custom cache directory
holodeck --cachepath /path/to/cache create -f env.yaml`
c.Version = "0.2.7"
Expand All @@ -86,6 +89,7 @@ Examples:

// Define the subcommands
c.Commands = []*cli.Command{
cleanup.NewCommand(log),
create.NewCommand(log),
delete.NewCommand(log),
dryrun.NewCommand(log),
Expand Down Expand Up @@ -129,6 +133,9 @@ EXAMPLES:
# Delete an environment
{{.Name}} delete <instance-id>

# Clean up AWS VPC resources
{{.Name}} cleanup vpc-12345678

# Use a custom cache directory
{{.Name}} --cachepath /path/to/cache create -f env.yaml

Expand Down
1 change: 1 addition & 0 deletions docs/commands/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ commands.
## Basic Commands

- [create](create.md) - Create a new environment
- [cleanup](cleanup.md) - Clean up AWS VPC resources
- [delete](delete.md) - Delete an existing environment
- [list](list.md) - List all environments
- [status](status.md) - Check the status of an environment
Expand Down
79 changes: 79 additions & 0 deletions docs/commands/cleanup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Cleanup Command

The `cleanup` command deletes AWS VPC resources, with optional GitHub job status
checking.

## Usage

```bash
holodeck cleanup [options] VPC_ID [VPC_ID...]
```

## Description

The cleanup command performs comprehensive deletion of AWS VPC resources including:

- EC2 instances
- Security groups (with ENI detachment)
- Subnets
- Route tables
- Internet gateways
- The VPC itself

Before deletion, it can optionally check GitHub Actions job status using VPC tags
to ensure jobs are completed.

## Options

- `--region, -r`: AWS region (overrides AWS_REGION environment variable)
- `--force, -f`: Force cleanup without checking GitHub job status

## Environment Variables

- `AWS_REGION`: Default AWS region if not specified via flag
- `AWS_DEFAULT_REGION`: Fallback region if AWS_REGION is not set
- `GITHUB_TOKEN`: GitHub token for checking job status (optional)

## Examples

### Clean up a single VPC

```bash
holodeck cleanup vpc-12345678
```

### Clean up multiple VPCs

```bash
holodeck cleanup vpc-12345678 vpc-87654321
```

### Force cleanup without job status check

```bash
holodeck cleanup --force vpc-12345678
```

### Clean up in a specific region

```bash
holodeck cleanup --region us-west-2 vpc-12345678
```

## GitHub Job Status Checking

If the VPC has the following tags and `GITHUB_TOKEN` is set:

- `GitHubRepository`: The repository in format `owner/repo`
- `GitHubRunId`: The GitHub Actions run ID

The command will check if all jobs in that run are completed before proceeding with
deletion. Use `--force` to skip this check.

## Notes

- The command handles dependencies between resources automatically
- Security groups attached to ENIs are detached before deletion
- Non-main route tables are handled appropriately
- VPC deletion includes retry logic (3 attempts with 30-second delays)
- Partial failures are logged but don't stop the cleanup process
Loading
Loading