Local backend DB suite cascades to ~540 failures — postgres-test cleanup is fragile between files #234
Labels
No labels
agent:hermes
bug
chore
dependency
feature
status:blocked
status:in-progress
status:needs-info
status:needs-review
status:ready
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
owlburtoe/Shiftd#234
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Running the full backend integration suite locally via
pnpm test:backend:dbproduces a ~540-failure cascade across most files. This is environmental / test-harness instability, not an application regression — it reproduces onmainat equivalent magnitude.This makes local backend integration testing effectively unusable on the affected machine, so flakes like #226 cannot be locally re-verified end-to-end. CI does not run this vitest suite (the release-artifacts job only does schema-push, type-check, build, runtime smoke, migration ordering; e2e.yml runs Playwright), so there is currently no automated coverage of the backend integration suite either.
Evidence (controlled comparison)
mainFATAL: the database system is shutting down(57P03)truncateAll()TRUNCATE failuresEquivalent failure magnitude on both branches → not branch-specific.
Root cause (observed)
The cascade is a cleanup-isolation failure, not a logic failure:
facilityController.test.ts) fails because itsbeforeEachtruncateAll()does not reliably clear the DB.seedTestData(), which then collides:duplicate key value violates unique constraint "users_pkey"/facilities_slug_uniqueinsert or update violates foreign key constraint ..._department_id_departments_id_fk(department parent missing)The trigger of the initial cleanup failure is non-deterministic:
mainrun loggedFATAL: 57P03 the database system is shutting down(ProcessStartupPacket) — thepostgres-testcontainer restarted mid-run (likely Colima/Docker memory pressure).terminating connection/Connection terminatederrors.Either way the
postgres-testcontainer / connection drops partway through, andtruncateAll()has no guard against running against a dropped/restarting server.Suggested mitigations (to scope)
postgres-testhealth between files, fail fast with a clear message instead of cascading.truncateAll()so a failed truncate aborts the run (or retries) rather than letting the nextseedTestData()collide.postgres-test(bump memory; the 57P03 shutdown smells like OOM/restart).Notes
Not a blocker for #226's fix (PR #230), whose targeted
client.query() while executingwarning was verifiably eliminated (1 → 0 vs main). This issue is purely about local-suite reliability.Could not reproduce this on current main from a fresh worktree:
pnpm test:backend:dbpassed twice locally, including after the hardening changes (133 files / 1472 tests passed).I still opened PR #241 to harden the harness against the reported failure mode: fail-fast wrapper mode, per-file DB reachability checks, and clearer non-secret reset errors so a dropped
postgres-testconnection stops as one actionable failure instead of cascading into duplicate-key/FK noise.