ci: restructure deploy pipeline for ACR bootstrap#294
Merged
Conversation
Split the 2-job pipeline into 3 jobs: 1. infra: Terraform apply with placeholder images (creates ACR, VNet, Key Vault, storage, container app environment, etc.) 2. build-push: Build Docker image and push to the ACR created in step 1 3. deploy: Terraform apply again with real image references This fixes the bootstrap problem where ACR didn't exist yet on first deploy. The infra job outputs the ACR login server for build-push. Also adds terraform_wrapper: false to the infra job so terraform output commands return raw values without wrapper decoration. Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace subscription_id and jira_email with __PLACEHOLDERS__ that the deploy pipeline substitutes from GitHub variables at runtime. This keeps the tfvars committable without exposing personal data. - Add 'Substitute tfvars placeholders' step to both infra and deploy jobs - Change .gitignore from *.tfvars to *.tfvars.local (allow committing) - Set JIRA_EMAIL as GitHub repo variable Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After enabling public access on the TF state storage account, poll with az storage container list until the endpoint is reachable (5s intervals, 60s timeout). Replaces blind sleep 30 which was unreliable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CI/CD service principal needs AcrPush on the container registry to push Docker images during the build-push job. Uses the same data.azurerm_client_config.current.object_id pattern as deployer_kv. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure policy enforces shared_access_key_enabled=false. The provider needs storage_use_azuread=true to use Azure AD for storage data plane operations (queue creation, blob container creation). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d9217a1 to
b20cb3e
Compare
Replace single KV_BOOTSTRAP_SECRETS JSON blob with individual secrets (GITLAB_TOKEN, GH_PAT_FOR_COPILOT, JIRA_API_TOKEN). Assembled into TF_VAR_kv_bootstrap_secrets env var at apply time. Easier to rotate and clearer in the GitHub UI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add copilot_auth variable ('github_token' | 'byok') to control which
KV secrets and env vars are injected into container apps
- github_token mode: injects GITHUB_TOKEN only (default)
- byok mode: injects COPILOT_API_KEY + COPILOT_PROVIDER_TYPE +
COPILOT_PROVIDER_BASE_URL
- Fix deprecated storage_account_name → storage_account_id on queue
and container resources
- Add copilot_provider_type and copilot_provider_base_url variables
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Storage account created with public_network_access_enabled=false - Private DNS zones (blob, queue, table) linked to VNet - Private endpoints for all three storage sub-resources - Storage subnet (snet-storage) added to networking - NSG rule: container-apps → storage subnet on 443 - KEDA scaler: cloud=Private, endpointSuffix for private queue - No bootstrap needed — queue/container creation is ARM control-plane Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rride
var-file values take precedence over TF_VAR_ env vars in Terraform.
Having kv_bootstrap_secrets={} in dev.tfvars prevented the pipeline's
TF_VAR_kv_bootstrap_secrets from injecting the actual secrets.
Part of #283
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ACA validates container images at creation time. Using 'placeholder' as image name caused 400 errors. Switch to the ACA quickstart image from MCR which is publicly pullable without ACR auth. Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collapsed 3 jobs into 1 sequential flow: 1. terraform apply with MCR quickstart (creates ACR + all infra) 2. docker build + push to ACR 3. terraform apply with real image (updates container apps) Eliminates placeholder image issues and cross-job state passing. Increased TF state propagation timeout to 120s. Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pipeline now builds and pushes to GHCR first (no infra dependency), then a single terraform apply creates ACR (Premium), imports the image from GHCR via az acr import, and deploys container apps with the ACR-hosted image. - ACR upgraded to Premium (required for private endpoints) - ACR private DNS zone + endpoint on storage subnet - null_resource.acr_import: open ACR public → import → close - Removed controller_image/job_image vars; replaced with image_tag - Container apps derive image from ACR login_server + image_tag - Added packages:write permission for GHCR push - Closes #295 (ACR hardening) Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The az storage container list probe uses different auth than terraform init's OIDC backend. Polling with az succeeds but terraform still gets 403. Replace the az-based probe with a retry loop around terraform init itself — the definitive test. Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove ACR public access toggle — az acr import is ARM control-plane and works regardless of network settings - Add set -euo pipefail to all provisioner heredocs so errors fail fast instead of silently continuing (fixes ACR --public-network-access bug) Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The KEDA azure-queue scaler had no authentication configured and shared_access_key_enabled=false on the storage account, so KEDA could never poll the queue and job executions never triggered. - Remove cloud=Private and endpointSuffix from KEDA metadata (private DNS handles routing; clients use queue.core.windows.net) - Add azapi_update_resource to patch the scale rule with the job managed identity (azurerm provider doesn't support this yet) - Job identity already has Storage Queue Data Contributor role Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Copilot SDK agent may run git commit during a coding session. When this happens, git diff --cached returns empty because the files are already committed. Fix by capturing HEAD SHA before the session and falling back to git diff pre_sha..HEAD when staged diff is empty but HEAD has moved. Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Belt-and-suspenders: the prompt tells the agent not to git add/commit (primary fix), and _build_coding_result detects agent-committed changes as a fallback (defense-in-depth). Part of #283 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ommands
The task_id was mr-{project}-{mr_iid}, so the second /copilot command
on the same MR would return the cached result from the first command.
Include the note ID to make each command dispatch unique.
Works for both webhook (GitLab includes object_attributes.id) and
poller (passes note.id explicitly).
Part of #283
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move all workflow_dispatch inputs and vars to env blocks to prevent shell injection via crafted image_tag values (OWASP #3 finding) - Change deployer ACR role from AcrPush to Container Registry Data Importer and Data Reader (least-privilege for az acr import) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Restructures the deploy workflow from 2 jobs to 3 jobs to solve the
bootstrap problem: ACR must exist before we can push images.
Changes
Also removes stale
acr_login_server,controller_image,job_imagefrom dev.tfvars (now provided via-varflags).Part of #283