Skip to content

ClusterCron InitialRun messages are deduplicated on pod restart, breaking cron chain #5960

@Necmttn

Description

@Necmttn

Description

When using ClusterCron.make() with SQL storage, the InitialRun singleton's message gets silently rejected on pod/process restart if a previous "initial" message still exists in the database. This breaks the cron chain and requires manual database cleanup to fix.

Current Behavior

  1. First deployment: InitialRun singleton creates a message with entity_id="initial" and message_id="ClusterCron/{name}/initial/run/"
  2. Cron chain works correctly, scheduling subsequent runs with timestamp-based entity_ids
  3. Pod restarts (or deployment update)
  4. InitialRun singleton attempts to create a new "initial" message
  5. Message is silently rejected due to UNIQUE constraint on message_id in cluster_messages table
  6. Cron chain is broken - no new executions occur

Expected Behavior

Pod restarts should not break the cron chain. Either:

  • InitialRun should detect an existing cron chain and skip re-initialization
  • InitialRun messages should have unique message_ids (e.g., include timestamp)
  • The message should use upsert semantics

Root Cause

The CronPayload class has an empty PrimaryKey:

// From @effect/cluster/src/ClusterCron.ts
class CronPayload extends Schema.Class<CronPayload>("@effect/cluster/ClusterCron/CronPayload")({
  dateTime: Schema.DateTimeUtc
}) {
  [PrimaryKey.symbol]() {
    return ""  // Empty string means all InitialRun messages have same message_id
  }
  // ...
}

This results in a deterministic message_id:

ClusterCron/{name}/initial/run/

When the InitialRun singleton runs again, it generates the same message_id, which violates the UNIQUE constraint.

Reproduction Steps

  1. Create a ClusterCron with SQL storage:
const MyCron = ClusterCron.make({
  name: "MyCron",
  cron: Cron.parse("*/5 * * * * *").pipe(Either.getOrThrow),
  execute: Effect.log("Cron executed")
})
  1. Run the cluster - observe cron executing every 5 seconds
  2. Stop the process
  3. Wait a moment
  4. Start the process again
  5. Observe that cron no longer executes (InitialRun message rejected)

Workaround

Manually delete the "initial" messages from the database:

DELETE FROM cluster_messages
WHERE entity_id = 'initial'
AND entity_type LIKE 'ClusterCron%';

Suggested Fix

Option A: Include timestamp in InitialRun's entity_id or primary key:

const InitialRun = Singleton.make(
  `ClusterCron/${options.name}`,
  Effect.gen(function*() {
    const client = (yield* CronEntity.client)(`initial-${Date.now()}`) // Unique each time
    // ...
  })
)

Option B: Check if cron chain is already active before creating InitialRun message:

const InitialRun = Singleton.make(
  `ClusterCron/${options.name}`,
  Effect.gen(function*() {
    // Check if there's already a pending/scheduled message for this cron
    const hasPending = yield* checkPendingMessages(options.name)
    if (hasPending) return

    const client = (yield* CronEntity.client)("initial")
    // ...
  })
)

Option C: Use upsert semantics for persisted messages (might have broader implications)

Environment

  • @effect/cluster version: 0.56.x
  • Storage: SQL (PostgreSQL)
  • Runtime: Bun
  • Platform: Kubernetes

Additional Context

This issue particularly affects Kubernetes deployments where pods are frequently restarted due to:

  • Rolling updates
  • Node scaling
  • Pod evictions
  • Crash restarts

This edge case might not be obvious in development but becomes problematic in production Kubernetes environments with frequent pod restarts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions