Skip to content

feat: add Terraform deployment for syft-enclave#9433

Open
koenvanderveen wants to merge 4 commits into
devfrom
koen/terraform
Open

feat: add Terraform deployment for syft-enclave#9433
koenvanderveen wants to merge 4 commits into
devfrom
koen/terraform

Conversation

@koenvanderveen

@koenvanderveen koenvanderveen commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add a flat Terraform root module (packages/syft-enclave/terraform/) that provisions the full enclave stack in one apply: GCP APIs, service account + IAM, Secret Manager secret (incl. the token version), and the AMD SEV Confidential Space VM
  • Single dev_mode variable flips the debug bundle (confidential-space-debug image with SSH, container logs to serial, restart Always, encryption off by default) — mirroring start vs start-debug; tee-metadata changes force VM replacement since Confidential Space reads metadata only at boot
  • New just tf-* recipes (tf-init/plan/apply/apply-dev/destroy/redeploy/output/attest/logs/status/ssh) driven by a gitignored terraform.tfvars; existing gcloud recipes unchanged
  • Setup guide in docs/terraform.md (ADC auth, quickstarts, dev iteration flow, day-2 ops, gcloud↔terraform mapping), linked from the package README
  • Harden both deploy paths so no inbound port is opened on the enclave: drop the http-server network tags and firewall-rule remnants; just attest now fetches via SSH + localhost (attestation is published through the peer flow in SYFT_version.json)

Fixes from live e2e testing

The flow was verified end-to-end on GCP: a tf-deployed dev-mode enclave ran the full 4-actor flow (enclave + 2 DOs + DS, notebooks/enclave-style steps) — a job referencing both private datasets executed only after both DOs approved, with results delivered to the DS and both DOs. Issues found and fixed along the way:

  • IAM propagation race: a fresh apply booted the VM before the SA's confidentialcomputing.workloadUser grant propagated → the Confidential Space launcher 403s and exits (exit_code=4), leaving a dead VM. Fixed with time_sleep.iam_propagation (120s) between IAM grants and the instance; recovery documented (gcloud compute instances reset)
  • tf-logs: the serial console caps redirected container output (chatty containers go quiet) — now prefers SSH + journalctl with serial fallback for production
  • tf-redeploy: accepts passthrough args (e.g. -auto-approve)

Misc

  • gitignore local JupyterLab state files (.jupyter/, .jupyter_ystore.db)
  • add jupyter-collaboration + jupyter-mcp-tools to dev deps (Jupyter MCP server for agent-driven notebook testing)

Test plan

  • terraform validate + terraform fmt -check clean; just --list parses; tfvars preflight errors helpfully
  • State/.terraform//tfvars gitignored; .terraform.lock.hcl committed
  • Live e2e on GCP (see above): tf-apply-dev → attest (GCP_AMD_SEV, secure boot) → 4-actor job flow → tf-destroy (12 resources, APIs left enabled); verified no network tags and external :8080 unreachable

🤖 Generated with Claude Code

Declarative alternative to the gcloud Justfile flow: one terraform apply provisions APIs, service account, IAM, Secret Manager secret (incl. token version), and the SEV Confidential Space VM. A single dev_mode var flips the debug bundle (SSH image, serial container logs, restart Always, encryption off). New just tf-* recipes wrap it; setup guide in docs/terraform.md linked from the README. Also hardens both deploy paths to guarantee no inbound port on the enclave (drop http-server tags and firewall remnants; attest now goes via SSH + localhost).
Findings from a real e2e test (4-actor enclave flow ran successfully against a tf-deployed enclave): a fresh apply boots the VM before the SA's confidentialcomputing.workloadUser grant propagates, so the Confidential Space launcher 403s and exits leaving a dead VM — add a 120s time_sleep between IAM grants and the instance. tf-logs now prefers SSH + journalctl (serial console caps redirected container logs) with serial fallback for production. tf-redeploy accepts passthrough args like -auto-approve.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant