test: prune redundant tests, convert pure-logic tests to no-DB units, harden test-DB retry#604
Conversation
High-confidence redundant tests whose coverage is fully retained elsewhere (confirmed by an adversarial review pass): - activityLogin: "Login activity is visible…" — the failed-login test asserts the same successful-login row first. - addMinersValidation: the three single-category dialog tests and the mix-shows-Continue-anyway test — subsumed by "Shows multiple error categories" + "Continue anyway proceeds"; categorization is unit-tested. - groups: the manage-security password-mismatch test — identical to the rack and single-miner security tests, which stay. - navigation.spec.ts: deleted; its only remaining test was a navigateTo* chain whose URL assertions are re-run by sibling specs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
High-confidence duplicate DB-backed tests; coverage stays in the named siblings (confirmed by an adversarial review pass): - enrollment: UpdateLastSeenAdvancesTimestamp / ReturnsNotFound — the gateway UploadHeartbeat tests exercise the same UpdateLastSeen SQL path. - domain/pairing: DiscoverWithIPRange_ContinuesScanAfterCollisionSkip — identical to the IPList variant (same discoverAllPortsForIP path). - authz: Reconcile_OperatorEditToAdminSurvivesRestart — dominated by ...OperatorRevokedSensitivePermStaysRevokedOnRestart. - timescaledb: GetCombinedMetrics_TemperatureStatusCounts — strict subset of ..._Values. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These spun up a fully-migrated database per run but never queried it (constructor guards / pre-transaction validation), paying the per-test migration cost for nothing. Coverage is retained at no DB cost: - miner: constructor nil-guards + the empty-identifier short-circuit now use an unconnected sql handle instead of GetTestDB. - fleetnode/pairing: the agent-report validation cases (private-IP rejection, invalid IP/port, URL-scheme injection, accepted schemes) move to a table test against validateReport directly. The InvalidArgument mapping + rollback on a rejected report stays covered by BatchValidationErrorRollsBack. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🔐 Codex Security Review
Review SummaryOverall Risk: NONE FindingsNo actionable security, correctness, or reliability findings in the scoped diff. NotesReviewed The network-discovery validation tests removed from migrated integration coverage are represented by the new direct Generated by Codex Security Review | |
Under the per-test-database model every test creates a database and replays all migrations; under heavy concurrency a TimescaleDB restart (e.g. a crash during a CREATE INDEX CONCURRENTLY) takes the server down for a second or two. `restart: always` brings it back, but in that window every in-flight migration fails with "driver: bad connection" and every new connection with "the database system is not yet accepting connections (SQLSTATE 57P03)" — so a single restart cascaded into dozens of unrelated test failures in one job. connectAndMigrateWithRetry only retried deadlock/serialization errors. Treat the transient server-restart errors (57P03, starting-up/shutting-down, bad connection, connection-already-closed) as retryable too, and wait for the server to accept connections again before recreating the test database for the retry. A test now survives a transient restart instead of failing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c02ff34ba1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR streamlines and hardens the test suite across server and Playwright E2E by removing redundant tests, converting “no real DB usage” cases into no-DB unit tests, and improving resilience of per-test DB provisioning during transient TimescaleDB restarts.
Changes:
- Hardened server test DB migration retries to treat transient server-restart errors as retryable, with a readiness wait before recreating the test DB.
- Converted several server tests that only exercise validation/guard logic into no-DB unit tests (plus added a direct unit test for
validateReport). - Pruned redundant Playwright E2E specs/tests and removed duplicate server integration tests where coverage is retained elsewhere.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| server/internal/testutil/database_setup.go | Adds retry classification for transient server restart errors and waits for server readiness before DB recreation. |
| server/internal/testutil/database_setup_test.go | Adds unit coverage for retryability classification (non-retryable unique violation; retryable transient restart strings). |
| server/internal/infrastructure/timescaledb/telemetry_store_test.go | Removes a redundant temperature bucketing integration test. |
| server/internal/domain/pairing/service_test.go | Removes a redundant DiscoverWithIPRange collision-skip test variant. |
| server/internal/domain/miner/testhelpers_test.go | Adds a “no DB provision” dependency helper for constructor/guard tests. |
| server/internal/domain/miner/service_test.go | Switches pure-logic tests to use the no-DB helper instead of provisioning a migrated DB. |
| server/internal/domain/fleetnode/pairing/validate_report_internal_test.go | New no-DB unit test covering validateReport rules directly. |
| server/internal/domain/fleetnode/pairing/integration_test.go | Removes DB-backed validation tests now covered via direct validateReport unit coverage. |
| server/internal/domain/fleetnode/enrollment/integration_test.go | Removes redundant UpdateLastSeen tests now covered by gateway heartbeat coverage. |
| server/internal/domain/authz/reconcile_test.go | Removes a redundant reconcile restart test dominated by a stricter sibling test. |
| client/e2eTests/protoFleet/spec/navigation.spec.ts | Removes an entire navigation spec deemed redundant with sibling specs. |
| client/e2eTests/protoFleet/spec/groups.spec.ts | Removes a redundant manage-security password-mismatch E2E scenario. |
| client/e2eTests/protoFleet/spec/addMinersValidation.spec.ts | Removes redundant single-category dialog E2E cases and a mixed-validity “Continue anyway” case. |
| client/e2eTests/protoFleet/spec/activityLogin.spec.ts | Removes a redundant successful-login activity E2E test covered by the failed-login flow. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Ankit Goswami <ankit.goswami@gmail.com>
The previous commit only made the migration step retry a transient TimescaleDB restart. But GetTestDB runs the admin DROP DATABASE / CREATE DATABASE before connectAndMigrateWithRetry, with plain assert.NoError — so a test that *starts* during the restart still failed on its first DROP/CREATE and never reached the retry path, leaving the documented cascade only half-covered. Route the initial create through the same readiness-aware retry: extract a shared createTestDatabase helper (waits for the server, retries the DROP/CREATE on the transient-restart errors) and use it both for the initial creation and for the migration-retry recreate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… error waitForServerReady returned immediately if the initial ConnectToDatabase failed, skipping the readiness wait it promises. Probe with a fresh connect+ping each iteration until the timeout, and log if readiness never confirms so a later admin-DDL failure is easier to diagnose. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f4bb3c845b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
A container restart terminates in-flight sessions with "terminating connection due to administrator command (SQLSTATE 57P01)" (admin_shutdown), and a crash surfaces 57P02 — neither matched the transient set, so the very restart this guards against could still fail a migration before the DB is recreated. Add the class-57 shutdown SQLSTATEs by code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
edgars-avotins
left a comment
There was a problem hiding this comment.
I doubt these e2e tests were slow, but I agree they are mostly covered by other scenarios, so it's not a big loss if these are removed
Summary
Three test-suite improvements, all pure test changes (no production code):
The removals/conversions came from a per-test audit, each re-checked by an adversarial skeptic that defaulted to "keep", so only safe changes remain.
1. E2E removals (
test(e2e))activityLogin.spec.tsaddMinersValidation.spec.tsnetworkDiscovery.test.tsgroups.spec.tsnavigation.spec.tsnavigateTo*chain whose URL assertions re-run in sibling specs2. Server removals (
test(server))Duplicate DB-backed tests; coverage stays in the named siblings:
enrollment:UpdateLastSeenAdvancesTimestamp/…ReturnsNotFound→ gatewayUploadHeartbeattests cover the sameUpdateLastSeenSQL pathdomain/pairing:DiscoverWithIPRange_ContinuesScanAfterCollisionSkip→ identical to theIPListvariantauthz:Reconcile_OperatorEditToAdminSurvivesRestart→ dominated by…OperatorRevokedSensitivePermStaysRevokedOnRestarttimescaledb:GetCombinedMetrics_TemperatureStatusCounts→ strict subset of…_Values3. Convert to no-DB unit tests (
test(server))These provisioned a fully-migrated database per run but never queried it (constructor guards / pre-transaction validation). Coverage kept at zero DB cost:
miner: constructor nil-guards + the empty-identifier short-circuit now use an unconnectedsql.DBhandle instead ofGetTestDB.fleetnode/pairing: the agent-report validation cases (private-IP rejection incl. the 8 loopback/link-local/public/multicast cases, invalid IP/port, URL-scheme injection guard, accepted schemes) become one table test againstvalidateReportdirectly. TheInvalidArgumentmapping + rollback stays covered byBatchValidationErrorRollsBack.4. Harden test-DB setup against transient server restarts (
test(server))This PR's first CI run failed with ~70 cascading
57P03 (the database system is not yet accepting connections)errors. Root cause: under the per-test-database model every test replays all migrations, and under that concurrent load a TimescaleDB restart (aCREATE INDEX CONCURRENTLYdropped its connection → server recovered in ~2s viarestart: always) made every in-flight migration fail withdriver: bad connectionand every new connection fail with57P03.connectAndMigrateWithRetryonly retried deadlock/serialization, so the blip failed the whole job.Fix: treat the transient server-restart errors (
57P03, starting-up/shutting-down,bad connection,connection is already closed) as retryable, and wait for the server to accept connections again before recreating the test database for the retry. A test now survives a transient restart instead of failing. Locked in by a no-DB unit test using the exact error strings from the failed run.Safety
DISTINCT/status queries,CountActiveUnpairedDevices, nmap argv-injection guard, alert-cap boundary, nested-tx read-your-writes, RFC4193 IPv6 round-trip).Verification
go build ./...,go vet ./...,golangci-lint,gofmtclean; all affected server packages pass.validateReportandisRetryableMigrationErrorunit tests pass with no DB; the 5 converted miner tests run in 0.00s each.npx playwright test --listloads all specs; eslint + prettier clean.