Skip to content

test: prune redundant tests, convert pure-logic tests to no-DB units, harden test-DB retry#604

Merged
ankitgoswami merged 8 commits into
mainfrom
test/prune-redundant-tests
Jun 29, 2026
Merged

test: prune redundant tests, convert pure-logic tests to no-DB units, harden test-DB retry#604
ankitgoswami merged 8 commits into
mainfrom
test/prune-redundant-tests

Conversation

@ankitgoswami

@ankitgoswami ankitgoswami commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Three test-suite improvements, all pure test changes (no production code):

  1. Remove high-confidence redundant slow tests (E2E + server).
  2. Convert "DB-backed but never queries the DB" tests into no-DB unit tests.
  3. Harden the per-test database setup so a transient TimescaleDB restart no longer cascades into a job-wide failure.

The removals/conversions came from a per-test audit, each re-checked by an adversarial skeptic that defaulted to "keep", so only safe changes remain.

1. E2E removals (test(e2e))

Spec Removed Coverage retained by
activityLogin.spec.ts "Login activity is visible…" the failed-login test asserts the same successful-login row first
addMinersValidation.spec.ts 3 single-category dialog tests + "mix shows Continue anyway" (4) "Shows multiple error categories" + "Continue anyway proceeds"; categorization is unit-tested in networkDiscovery.test.ts
groups.spec.ts manage-security password-mismatch test identical rack-overview and single-miner security tests (kept)
navigation.spec.ts whole file its one remaining test was a navigateTo* chain whose URL assertions re-run in sibling specs

2. Server removals (test(server))

Duplicate DB-backed tests; coverage stays in the named siblings:

  • enrollment: UpdateLastSeenAdvancesTimestamp / …ReturnsNotFound → gateway UploadHeartbeat tests cover the same UpdateLastSeen SQL path
  • domain/pairing: DiscoverWithIPRange_ContinuesScanAfterCollisionSkip → identical to the IPList variant
  • authz: Reconcile_OperatorEditToAdminSurvivesRestart → dominated by …OperatorRevokedSensitivePermStaysRevokedOnRestart
  • timescaledb: GetCombinedMetrics_TemperatureStatusCounts → strict subset of …_Values

3. Convert to no-DB unit tests (test(server))

These provisioned a fully-migrated database per run but never queried it (constructor guards / pre-transaction validation). Coverage kept at zero DB cost:

  • miner: constructor nil-guards + the empty-identifier short-circuit now use an unconnected sql.DB handle instead of GetTestDB.
  • fleetnode/pairing: the agent-report validation cases (private-IP rejection incl. the 8 loopback/link-local/public/multicast cases, invalid IP/port, URL-scheme injection guard, accepted schemes) become one table test against validateReport directly. The InvalidArgument mapping + rollback stays covered by BatchValidationErrorRollsBack.

4. Harden test-DB setup against transient server restarts (test(server))

This PR's first CI run failed with ~70 cascading 57P03 (the database system is not yet accepting connections) errors. Root cause: under the per-test-database model every test replays all migrations, and under that concurrent load a TimescaleDB restart (a CREATE INDEX CONCURRENTLY dropped its connection → server recovered in ~2s via restart: always) made every in-flight migration fail with driver: bad connection and every new connection fail with 57P03. connectAndMigrateWithRetry only retried deadlock/serialization, so the blip failed the whole job.

Fix: treat the transient server-restart errors (57P03, starting-up/shutting-down, bad connection, connection is already closed) as retryable, and wait for the server to accept connections again before recreating the test database for the retry. A test now survives a transient restart instead of failing. Locked in by a no-DB unit test using the exact error strings from the failed run.

Safety

  • Mutual-redundancy pairs resolved by keeping one side, not both: kept the rack-overview security test (dropped group) and the gateway heartbeat tests (dropped enrollment-direct).
  • Tests that look redundant but uniquely guard real SQL/schema/security behavior were left in place (firmware DISTINCT/status queries, CountActiveUnpairedDevices, nmap argv-injection guard, alert-cap boundary, nested-tx read-your-writes, RFC4193 IPv6 round-trip).

Verification

  • go build ./..., go vet ./..., golangci-lint, gofmt clean; all affected server packages pass.
  • New validateReport and isRetryableMigrationError unit tests pass with no DB; the 5 converted miner tests run in 0.00s each.
  • npx playwright test --list loads all specs; eslint + prettier clean.

ankitgoswami and others added 3 commits June 26, 2026 15:22
High-confidence redundant tests whose coverage is fully retained elsewhere
(confirmed by an adversarial review pass):

- activityLogin: "Login activity is visible…" — the failed-login test asserts
  the same successful-login row first.
- addMinersValidation: the three single-category dialog tests and the
  mix-shows-Continue-anyway test — subsumed by "Shows multiple error
  categories" + "Continue anyway proceeds"; categorization is unit-tested.
- groups: the manage-security password-mismatch test — identical to the rack
  and single-miner security tests, which stay.
- navigation.spec.ts: deleted; its only remaining test was a navigateTo*
  chain whose URL assertions are re-run by sibling specs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
High-confidence duplicate DB-backed tests; coverage stays in the named
siblings (confirmed by an adversarial review pass):

- enrollment: UpdateLastSeenAdvancesTimestamp / ReturnsNotFound — the gateway
  UploadHeartbeat tests exercise the same UpdateLastSeen SQL path.
- domain/pairing: DiscoverWithIPRange_ContinuesScanAfterCollisionSkip —
  identical to the IPList variant (same discoverAllPortsForIP path).
- authz: Reconcile_OperatorEditToAdminSurvivesRestart — dominated by
  ...OperatorRevokedSensitivePermStaysRevokedOnRestart.
- timescaledb: GetCombinedMetrics_TemperatureStatusCounts — strict subset of
  ..._Values.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These spun up a fully-migrated database per run but never queried it
(constructor guards / pre-transaction validation), paying the per-test
migration cost for nothing. Coverage is retained at no DB cost:

- miner: constructor nil-guards + the empty-identifier short-circuit now use
  an unconnected sql handle instead of GetTestDB.
- fleetnode/pairing: the agent-report validation cases (private-IP rejection,
  invalid IP/port, URL-scheme injection, accepted schemes) move to a table
  test against validateReport directly. The InvalidArgument mapping + rollback
  on a rejected report stays covered by BatchValidationErrorRollsBack.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added javascript Pull requests that update javascript code client server labels Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (ca7d549b74f8f346b032647d5cca878f3128dd22...1974ce57d02c8e6f04841cbb2c6a9c7bcc688112, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: NONE

Findings

No actionable security, correctness, or reliability findings in the scoped diff.

Notes

Reviewed .git/codex-review.diff only. The changes are limited to test deletions/additions and server/internal/testutil/database_setup.go; I did not see production auth, database, gRPC, plugin, discovery, pool/wallet, protobuf, frontend runtime, or infrastructure behavior changed.

The network-discovery validation tests removed from migrated integration coverage are represented by the new direct validateReport unit test, and the remaining integration test still checks invalid-batch rollback/error mapping. I did not run the test suite; this was a static review of the authoritative diff.


Generated by Codex Security Review |
Triggered by: @ankitgoswami |
Review workflow run

Under the per-test-database model every test creates a database and replays
all migrations; under heavy concurrency a TimescaleDB restart (e.g. a crash
during a CREATE INDEX CONCURRENTLY) takes the server down for a second or two.
`restart: always` brings it back, but in that window every in-flight migration
fails with "driver: bad connection" and every new connection with
"the database system is not yet accepting connections (SQLSTATE 57P03)" — so a
single restart cascaded into dozens of unrelated test failures in one job.

connectAndMigrateWithRetry only retried deadlock/serialization errors. Treat
the transient server-restart errors (57P03, starting-up/shutting-down,
bad connection, connection-already-closed) as retryable too, and wait for the
server to accept connections again before recreating the test database for the
retry. A test now survives a transient restart instead of failing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ankitgoswami ankitgoswami changed the title test: prune redundant slow tests; convert pure-logic tests to no-DB unit tests test: prune redundant tests, convert pure-logic tests to no-DB units, harden test-DB retry Jun 26, 2026
@ankitgoswami ankitgoswami marked this pull request as ready for review June 26, 2026 22:44
@ankitgoswami ankitgoswami requested a review from a team as a code owner June 26, 2026 22:44
Copilot AI review requested due to automatic review settings June 26, 2026 22:44

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c02ff34ba1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/testutil/database_setup.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR streamlines and hardens the test suite across server and Playwright E2E by removing redundant tests, converting “no real DB usage” cases into no-DB unit tests, and improving resilience of per-test DB provisioning during transient TimescaleDB restarts.

Changes:

  • Hardened server test DB migration retries to treat transient server-restart errors as retryable, with a readiness wait before recreating the test DB.
  • Converted several server tests that only exercise validation/guard logic into no-DB unit tests (plus added a direct unit test for validateReport).
  • Pruned redundant Playwright E2E specs/tests and removed duplicate server integration tests where coverage is retained elsewhere.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
server/internal/testutil/database_setup.go Adds retry classification for transient server restart errors and waits for server readiness before DB recreation.
server/internal/testutil/database_setup_test.go Adds unit coverage for retryability classification (non-retryable unique violation; retryable transient restart strings).
server/internal/infrastructure/timescaledb/telemetry_store_test.go Removes a redundant temperature bucketing integration test.
server/internal/domain/pairing/service_test.go Removes a redundant DiscoverWithIPRange collision-skip test variant.
server/internal/domain/miner/testhelpers_test.go Adds a “no DB provision” dependency helper for constructor/guard tests.
server/internal/domain/miner/service_test.go Switches pure-logic tests to use the no-DB helper instead of provisioning a migrated DB.
server/internal/domain/fleetnode/pairing/validate_report_internal_test.go New no-DB unit test covering validateReport rules directly.
server/internal/domain/fleetnode/pairing/integration_test.go Removes DB-backed validation tests now covered via direct validateReport unit coverage.
server/internal/domain/fleetnode/enrollment/integration_test.go Removes redundant UpdateLastSeen tests now covered by gateway heartbeat coverage.
server/internal/domain/authz/reconcile_test.go Removes a redundant reconcile restart test dominated by a stricter sibling test.
client/e2eTests/protoFleet/spec/navigation.spec.ts Removes an entire navigation spec deemed redundant with sibling specs.
client/e2eTests/protoFleet/spec/groups.spec.ts Removes a redundant manage-security password-mismatch E2E scenario.
client/e2eTests/protoFleet/spec/addMinersValidation.spec.ts Removes redundant single-category dialog E2E cases and a mixed-validity “Continue anyway” case.
client/e2eTests/protoFleet/spec/activityLogin.spec.ts Removes a redundant successful-login activity E2E test covered by the failed-login flow.

Comment thread server/internal/testutil/database_setup.go Outdated
Comment thread server/internal/domain/miner/testhelpers_test.go Outdated
ankitgoswami and others added 3 commits June 26, 2026 18:31
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ankit Goswami <ankit.goswami@gmail.com>
The previous commit only made the migration step retry a transient TimescaleDB
restart. But GetTestDB runs the admin DROP DATABASE / CREATE DATABASE before
connectAndMigrateWithRetry, with plain assert.NoError — so a test that *starts*
during the restart still failed on its first DROP/CREATE and never reached the
retry path, leaving the documented cascade only half-covered.

Route the initial create through the same readiness-aware retry: extract a
shared createTestDatabase helper (waits for the server, retries the DROP/CREATE
on the transient-restart errors) and use it both for the initial creation and
for the migration-retry recreate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… error

waitForServerReady returned immediately if the initial ConnectToDatabase
failed, skipping the readiness wait it promises. Probe with a fresh
connect+ping each iteration until the timeout, and log if readiness never
confirms so a later admin-DDL failure is easier to diagnose.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4bb3c845b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/testutil/database_setup.go
A container restart terminates in-flight sessions with
"terminating connection due to administrator command (SQLSTATE 57P01)"
(admin_shutdown), and a crash surfaces 57P02 — neither matched the transient
set, so the very restart this guards against could still fail a migration
before the DB is recreated. Add the class-57 shutdown SQLSTATEs by code.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@edgars-avotins edgars-avotins left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt these e2e tests were slow, but I agree they are mostly covered by other scenarios, so it's not a big loss if these are removed

@ankitgoswami ankitgoswami merged commit 3a30fd7 into main Jun 29, 2026
65 of 67 checks passed
@ankitgoswami ankitgoswami deleted the test/prune-redundant-tests branch June 29, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants