We are using River fully programmatically in a Go service using rivermigrate.Migrate Go api as mentioned here: https://riverqueue.com/docs/migrations.
We observed that the river migration process can leave the database in a partially-applied state when the service crashes or restarts in the middle of the migration. Sharing the steps below to reproduce the issue:
-
River successfully applies all migration files having DDL commands (creates tables / indexes / constraints) (ex.: riverqueue/river/riverdriver/riverdatabasesql/migration/main/001_create_river_migration.up.sql)
-
Schema versions applied is yet to be recorded in river_migration table and the application crashes or restarts
-
On the next startup, River retries the same full migration as there are no version entries recorded in river_migration table and the migration fails because the DDL already exists(PostgresSql errors like: relation already exists or index already exists)
This requires manual intervention against River’s internal tables, which is undesirable for application operators.
Expected Behavior:
River migrations should be:
- Atomic (DDL + schema version bookkeeping happen together in single transaction)
or
- DDL commands are Idempotent (safe to re-run)
This ensures that applications will not require to touch internal River tables manually.
Because of this potential issue, it prevents the service from starting until River’s tables are manually removed. Please suggest on how to proceed in such situations and how others are addressing this potential issue?