fix(deps): revert postgres 18→16 (#273 broke staging deploy; prod landmine) #274

Merged
owlburtoe merged 1 commit from fix/revert-postgres-18-major into main 2026-05-30 08:12:43 -04:00
Owner

What

Revert Renovate #273's postgres:16-alpine → 18-alpine bump across all 6 compose files + CI workflows, and pin postgres docker major updates off in renovate.json.

Why

  • Staging deploy failed (release-artifacts run #4180, deploy-staging job). CI test jobs on PG18 passed because service containers initdb fresh every run; the deploy failed because the persistent shiftd_staging volume was initialized by PG16 and a PG18 server refuses to start on it: FATAL: data directory was initialized by PostgreSQL version 16, not compatible with version 18. deploy.sh's healthcheck loop then timed out.
  • Production landmine: the identical bump sat in docker-compose.prod.yml on the postgres_data volume (real prod data). Prod isn't auto-deployed, but the next manual promote would recreate shiftd-db on PG18 against the PG16 prod volume → production DB won't boot.
  • A postgres major upgrade is a deliberate pg_upgrade/dump-restore migration, not a drop-in. Reverting CI too avoids testing against a different major than we run.

Changes

  • docker-compose.{yml,prod.yml,test.yml} + .github/workflows/{e2e,security,release-artifacts}.yml: 18-alpine → 16-alpine
  • renovate.json: new rule disabling postgres docker major updates (mirrors the typescript/tailwind toolchain-major policy)

Effect

Merging re-triggers deploy-staging against the PG16 volume → green. PG18 adoption deferred to a scheduled migration.

Test plan

  • CI green on this PR
  • On merge: deploy-staging succeeds + smoke-staging.sh passes
## What Revert Renovate #273's postgres:16-alpine → 18-alpine bump across all 6 compose files + CI workflows, and pin postgres docker **major** updates off in renovate.json. ## Why - **Staging deploy failed** (release-artifacts run #4180, deploy-staging job). CI test jobs on PG18 passed because service containers initdb fresh every run; the deploy failed because the persistent shiftd_staging volume was initialized by **PG16** and a **PG18** server refuses to start on it: FATAL: data directory was initialized by PostgreSQL version 16, not compatible with version 18. deploy.sh's healthcheck loop then timed out. - **Production landmine:** the identical bump sat in docker-compose.prod.yml on the postgres_data volume (real prod data). Prod isn't auto-deployed, but the next manual promote would recreate shiftd-db on PG18 against the PG16 prod volume → production DB won't boot. - A postgres major upgrade is a deliberate pg_upgrade/dump-restore migration, **not** a drop-in. Reverting CI too avoids testing against a different major than we run. ## Changes - docker-compose.{yml,prod.yml,test.yml} + .github/workflows/{e2e,security,release-artifacts}.yml: 18-alpine → 16-alpine - renovate.json: new rule disabling postgres docker **major** updates (mirrors the typescript/tailwind toolchain-major policy) ## Effect Merging re-triggers deploy-staging against the PG16 volume → green. PG18 adoption deferred to a scheduled migration. ## Test plan - [ ] CI green on this PR - [ ] On merge: deploy-staging succeeds + smoke-staging.sh passes
fix(deps): revert postgres docker tag 18→16; pin major (#273 broke staging deploy)
All checks were successful
Code Scanning / Gitleaks secret scan (pull_request) Successful in 5s
Code Scanning / Semgrep OSS source scan (pull_request) Successful in 31s
Security, Type Check & Runtime / Dependency Audit (pull_request) Successful in 9m33s
Security, Type Check & Runtime / Migration Guardrails (pull_request) Successful in 9m31s
Security, Type Check & Runtime / Type Check (pull_request) Successful in 10m10s
Security, Type Check & Runtime / Backend Runtime Smoke (pull_request) Successful in 10m15s
Release Artifacts / Validate release candidate (pull_request) Successful in 10m51s
Release Artifacts / Validate release ref (pull_request) Has been skipped
Release Artifacts / Build and push Docker release images (pull_request) Has been skipped
Release Artifacts / Deploy to staging (pull_request) Has been skipped
E2E Tests / e2e (pull_request) Successful in 14m12s
3863a922d9
Renovate #273 bumped postgres 16-alpine → 18-alpine across all compose
files and CI workflows. Ephemeral CI service containers initdb fresh so
their jobs passed, but `deploy-staging` failed: the persistent
shiftd_staging volume was initialized by PG16 and a PG18 server hard-
refuses to start on it (FATAL: data directory was initialized by
PostgreSQL 16, not compatible with 18), so deploy.sh's healthcheck loop
timed out.

The same bump sat in docker-compose.prod.yml on the postgres_data volume
with real prod data — the next manual promote would have recreated
shiftd-db on PG18 against the PG16 prod volume and taken production down.

A postgres major upgrade is a deliberate pg_upgrade/dump-restore
migration, not a drop-in. Revert to 16-alpine everywhere (keeping CI on
the same major we actually run) and add a renovate rule disabling
postgres docker major updates, matching the existing toolchain-major
policy. PG18 adoption is deferred to a scheduled migration.
owlburtoe deleted branch fix/revert-postgres-18-major 2026-05-30 08:12:43 -04:00
Sign in to join this conversation.
No description provided.