fix: stabilize e2e batches (split + wave-staggering), pin jstreemap to 1.28.2, fix @searchable override tests#14937
Open
sarayev wants to merge 8 commits into
Open
fix: stabilize e2e batches (split + wave-staggering), pin jstreemap to 1.28.2, fix @searchable override tests#14937sarayev wants to merge 8 commits into
sarayev wants to merge 8 commits into
Conversation
Add an additive alternative e2e execution mode that splits the single combined CodeBuild batch into two separate batches — one Linux-only, one Windows-only — fired against the same AmplifyCLI-E2E-Testing project. This dodges the CoFaWorkflows orchestrator "Internal Service Error" that faults when one batch fans out ~250 shards, by giving each orchestration workflow roughly half the graph and isolating Windows. No new CodeBuild project or IAM is required. The generator now emits two extra self-contained batchspecs alongside the untouched combined e2e_workflow_generated.yml: e2e_workflow_linux_generated .yml (prep chain + all l_* shards + linux aggregate) and e2e_workflow_ windows_generated.yml (prep chain + build_windows + all w_* shards + windows aggregate), each with its own wait_for_ids file. Each batch carries its own copy of the prep chain so the two can be fired independently with no cross- batch ordering; a code comment notes the future optimization to share prep via the source-version-keyed S3 cache. The trigger path is additive: triggerProjectBatch gains an optional buildspec-override argument, a new cloudE2ESplit function fires both batches and prints both batch IDs, and a cloud-e2e-split package script exposes it. A sibling wait-for-all-codebuild-split.ts polls two batch IDs and aggregates pass/fail. The combined single-batch path (cloudE2E, wait-for-all-codebuild .ts, combined yml) is unchanged. Verified by regenerating the batchspecs, confirming both new yml files parse with no dangling depend-on references, and grep-proving the partition (linux: prep + 136 l_ + aggregate, no w_/build_windows; windows: prep + build_windows + 124 w_ + aggregate, no l_). No e2e batch was triggered. --- Prompt: Implement the 2-batch split (Linux-only + Windows-only batches against the same AmplifyCLI-E2E-Testing project) as an additive alternative to the existing single-batch e2e path, on a new branch off origin/dev. Emit two new batchspecs from the generator, add an alternative trigger path and a two-batch wait-for-all, verify the partition without triggering e2e, then commit, push, and open a draft PR. Do not trigger e2e or add wave-staggering.
Adds an index-offset sliding-window dependency cap (max 75 concurrent in-progress builds per batch) to the optional split Linux/Windows e2e batch mode, to bound concurrent builds and improve batch reliability. Each shard at position p depends on the shard at position p-75, so no more than 75 shards are ever in-progress at once; positions 1..75 start as soon as prep is ready. Shard placement uses recorded per-shard durations: the longest-running shards are placed in the terminal (no-successor) slots so they each run alone, and the remaining shards are longest-processing-time paired into the two-deep chains so no chain's summed duration exceeds the longest single shard. Projected makespan (excluding prep) is about 194 minutes on Linux and about 153 minutes on Windows. The split shards use dedicated buildspec variants that always emit a non-empty primary artifact, so a chained successor can always resolve its predecessor's artifact even when a shard produces no coverage or report files. The combined/single-batch path and its buildspecs are left unchanged. Note: with the depend-on chain, a hard-failing gating shard will skip its dependent; this is mitigated by the existing per-shard retry, and skipped builds can be re-run. The split batch trigger raises the per-build timeout to 360 minutes to give the chained shards extra margin. --- Prompt: Add a sliding-window concurrency cap to the split linux/windows e2e batches so at most 75 builds run at once per batch, placing the longest shards in no-successor slots to minimize makespan, then commit and trigger the two batches.
The split Linux batchspec previously carried only the prep chain, the l_* shards, and the aggregate job, dropping the non-shard jobs that the combined workflow runs. Carry those jobs (build/test/lint/verify, install and integration checks, and resource cleanup) into the split Linux batch with their original dependency edges so the split execution mode runs the full e2e suite. These jobs are not part of the sliding-window chain; the window cap stays applied only to the l_*/w_* shards. The union of the Linux and Windows split batchspecs' job identifiers now equals the combined workflow's job set. The combined/single-batch path is unchanged. --- Prompt: The split batches dropped some combined-workflow jobs; add the missing non-shard linux jobs to the split linux batchspec with their existing deps, keep the sliding-window only on the shards, and prove the union of the two split batchspecs equals the combined job set.
…ains The @searchable e2e tests fail at OpenSearch domain creation because the default domain endpoint TLS policy (Policy-Min-TLS-1-0-2019-07) is no longer accepted by the service, which rejects the request with "Policy-Min-TLS-1-0-2019-07 is disabled" (ValidationException). Add a shared API override (overrides/override-api-gql-searchable.ts) that sets the OpenSearch domain endpoint options to enforce HTTPS and use the Policy-Min-TLS-1-2-2019-07 security policy. Each real searchable e2e test now scaffolds the API override, copies the shared override into the API resource's override.ts, and deploys via amplifyPushOverride instead of amplifyPush. The schema-searchable test is handled by adapting its underlying searchable-usage runner (which has its own self-contained flow) rather than the shared schema-api-directives runner, to avoid affecting unrelated directive tests. Verification: type-checked the touched test files and the override against amplify-e2e-core/cli-extensibility-helper types (tsc --noEmit, zero errors); eslint clean on the four test files. E2e tests deploy to AWS and are validated in cloud e2e. --- Prompt: Apply @searchable TLS-policy override (Policy-Min-TLS-1-2-2019-07) to unblock searchable e2e tests.
…ble tests
The prior TLS-1.2 override commit wired `amplify override api` +
`amplifyPushOverride` into all four searchable e2e tests. Two of them
invoke the override while the GraphQL API is still on transformer
version 1: schema-searchable (via the searchable-usage runner, which
uses `addApi({transformerVersion: 1})`) and the v1 phase of
searchable-migration. `amplify override api` only scaffolds override.ts
and emits the "Do you want to edit override.ts file now?" prompt for
transformer v2 APIs; for v1 it prints a "use transformer version 2"
warning and exits without prompting. The nexpect helper waits for that
prompt, so the spawn ends with the wait/sendNo/sendEof still queued,
producing "Non-empty queue on spawn exit" (the confirmed, non-flaky
failure).
override.ts is a v2/CDK-only mechanism, so it can neither be scaffolded
nor applied at the v1 stage. Revert those two v1 tests to their prior
`amplifyPush` flow to remove the hang. The two genuinely-v2 tests
(api_6c and searchable-datastore) keep the override wiring, where the
helper works and the TLS-1.2 policy is correctly applied. The v1
searchable tests' OpenSearch TLS-1.0 problem cannot be solved through
`amplify override api` and needs a separate v1-compatible mechanism.
Verification: tsc --build of amplify-e2e-tests passes with zero errors;
eslint clean on both reverted files.
---
Prompt: Fix the broken `amplify override api` test wiring that we added
for the @searchable TLS override, on branch
feat/e2e-split-linux-windows-batches. Validate the override step
LOCALLY (no AWS deploy), commit, and push so the e2e can re-run.
jstreemap releases 1.29.1-1.29.3 shipped a broken UMD bundle that threw "ReferenceError: self is not defined" on Node, crashing amplify-category-function during `amplify init` and causing SEV-2s. PR #14922 mitigated this by pinning jstreemap to 1.28.2, but the risk vector remained. The package only used jstreemap's TreeSet, and solely for membership checks and a single max-value lookup. These are trivially served by the native Set: all `new TreeSet()` instances become `new Set<number>()`, the `TreeSet<number>` type annotations become `Set<number>`, and the one `daysOfMonth.last()` call becomes `Math.max(...this.daysOfMonth)`. Removing the dependency entirely permanently eliminates the broken-bundle risk. Removed the `jstreemap` entry from the package manifest and regenerated yarn.lock to drop all jstreemap entries. Testing: tsc clean, cron expression tests 7/7 passing, and zero remaining jstreemap/TreeSet references across the package. --- Prompt: Can we get rid of jstreemap in our repo (amplify-cli) completely? Implement that! Prepare a PR for it.
In the split-batch sliding-window chain, a shard at an "early" position
(index < n - windowSize) is depend-on'd by the shard windowSize
positions later. A CodeBuild dependency only resolves on predecessor
SUCCESS, so an intermittently-failing ("red") shard placed in an early
slot would skip its downstream successor whenever it failed, masking
unrelated coverage.
Add a KNOWN_RED_SHARD_FRAGMENTS list and swap any matching shard out of
an early slot into a terminal (no-successor) slot during sliding-window
arrangement, so no shard ever depends on a red one and a red failure
never cascades. Fragments are matched as substrings to cover merged
bundle identifiers (e.g. l_searchable_datastore_schema_searchable). The
makespan projection is recomputed from the final placement so it
reflects the swaps. Regenerated both split workflow specs; all known-red
shards (7 Linux, 8 Windows) now have zero successors while retaining
their upstream dependency.
---
Prompt: Fold the jstreemap fix into the waving branch, then verify and
complete the terminal-placement reorder: place the known-red shards
(containers_api_1/2/secrets, custom_policies_container, schema_searchable,
searchable_migration, searchable_datastore, api_6c and their bundles) in
terminal slots on both Linux and Windows so no job depends on them,
regenerate the workflow, verify, build, commit and push.
…moving it Replace the previous removal of the jstreemap dependency with an exact version pin. jstreemap 1.29.x ships a broken UMD bundle that references bare `self`, causing `ReferenceError: self is not defined` under Node. Pinning to the exact "1.28.2" (no caret) keeps the working TreeSet-based cron expression generator while preventing resolution to the broken 1.29.x releases. This restores cronExpression.ts to its original TreeSet implementation (reverting the native Set workaround) and re-adds the dependency to package.json and yarn.lock at exactly 1.28.2. Tested with: tsc build of amplify-category-function (clean) and the cron suite scheduleWalkthrough.test.ts (7/7 passing). --- Prompt: On branch feat/e2e-split-linux-windows-batches (PR #14937): replace the jstreemap REMOVAL with a jstreemap PIN (1.28.2), keeping everything else. The final PR diff (vs base dev) must contain ONLY: (a) waving + terminal-placement of known-red shards, (b) jstreemap PIN, (c) searchable fixes. Build, verify, commit, push.
adrianjoshua-strutt
approved these changes
Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of changes
This PR groups three changes that stabilize the end-to-end (e2e) test pipeline and fix a dependency regression.
E2E batch stabilization (split + wave-staggering)
The combined e2e CodeBuild batch became unreliable to provision when a large number of build groups were scheduled to start at once. This splits the single batch into separate Linux and Windows batches, and applies an index-offset sliding window so at most N builds are in flight at any time (build i depends on build i − windowSize; the first window starts immediately). Known-flaky/failing shards are placed in terminal slots with no successors, so a failure in one of them cannot cascade and skip unrelated downstream shards. Adds the split buildspecs and a helper to wait across both batches.
Pin
jstreemapto 1.28.2jstreemap1.29.1–1.29.3 publish a broken UMD bundle that references a bareself, which is undefined in Node.js. This throwsReferenceError: self is not definedatrequiretime and breaksamplify initfor the function category. Pin to the last known-good version,1.28.2.@searchableoverride e2e testsNewer OpenSearch domains reject the previously-defaulted minimum TLS policy. For transformer v2
@searchabletests, setPolicy-Min-TLS-1-2-2019-07via the API override. The override mechanism is v2-only, so the v1 tests are reverted to their prior push flow (a v1-compatible TLS fix is tracked separately).Description of how you validated changes
amplify-category-functioncron parser unit tests pass.Checklist
yarn testpassesBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.