-
-
Notifications
You must be signed in to change notification settings - Fork 490
Description
Description
When using ClusterCron.make() with SQL storage, the InitialRun singleton's message gets silently rejected on pod/process restart if a previous "initial" message still exists in the database. This breaks the cron chain and requires manual database cleanup to fix.
Current Behavior
- First deployment: InitialRun singleton creates a message with
entity_id="initial"andmessage_id="ClusterCron/{name}/initial/run/" - Cron chain works correctly, scheduling subsequent runs with timestamp-based entity_ids
- Pod restarts (or deployment update)
- InitialRun singleton attempts to create a new "initial" message
- Message is silently rejected due to UNIQUE constraint on
message_idincluster_messagestable - Cron chain is broken - no new executions occur
Expected Behavior
Pod restarts should not break the cron chain. Either:
- InitialRun should detect an existing cron chain and skip re-initialization
- InitialRun messages should have unique message_ids (e.g., include timestamp)
- The message should use upsert semantics
Root Cause
The CronPayload class has an empty PrimaryKey:
// From @effect/cluster/src/ClusterCron.ts
class CronPayload extends Schema.Class<CronPayload>("@effect/cluster/ClusterCron/CronPayload")({
dateTime: Schema.DateTimeUtc
}) {
[PrimaryKey.symbol]() {
return "" // Empty string means all InitialRun messages have same message_id
}
// ...
}This results in a deterministic message_id:
ClusterCron/{name}/initial/run/
When the InitialRun singleton runs again, it generates the same message_id, which violates the UNIQUE constraint.
Reproduction Steps
- Create a ClusterCron with SQL storage:
const MyCron = ClusterCron.make({
name: "MyCron",
cron: Cron.parse("*/5 * * * * *").pipe(Either.getOrThrow),
execute: Effect.log("Cron executed")
})- Run the cluster - observe cron executing every 5 seconds
- Stop the process
- Wait a moment
- Start the process again
- Observe that cron no longer executes (InitialRun message rejected)
Workaround
Manually delete the "initial" messages from the database:
DELETE FROM cluster_messages
WHERE entity_id = 'initial'
AND entity_type LIKE 'ClusterCron%';Suggested Fix
Option A: Include timestamp in InitialRun's entity_id or primary key:
const InitialRun = Singleton.make(
`ClusterCron/${options.name}`,
Effect.gen(function*() {
const client = (yield* CronEntity.client)(`initial-${Date.now()}`) // Unique each time
// ...
})
)Option B: Check if cron chain is already active before creating InitialRun message:
const InitialRun = Singleton.make(
`ClusterCron/${options.name}`,
Effect.gen(function*() {
// Check if there's already a pending/scheduled message for this cron
const hasPending = yield* checkPendingMessages(options.name)
if (hasPending) return
const client = (yield* CronEntity.client)("initial")
// ...
})
)Option C: Use upsert semantics for persisted messages (might have broader implications)
Environment
- @effect/cluster version: 0.56.x
- Storage: SQL (PostgreSQL)
- Runtime: Bun
- Platform: Kubernetes
Additional Context
This issue particularly affects Kubernetes deployments where pods are frequently restarted due to:
- Rolling updates
- Node scaling
- Pod evictions
- Crash restarts
This edge case might not be obvious in development but becomes problematic in production Kubernetes environments with frequent pod restarts.