Self-hosted v4.4.3: docker-provider never processes runs (PENDING indefinitely)

# Self-hosted v4.4.3: docker-provider never processes runs (PENDING indefinitely)

## Environment

- **trigger.dev version**: v4.4.3 (webapp, coordinator, docker-provider all same version)
- **Deployment**: Docker Swarm via Portainer CE
- **OS**: Debian 12 Bookworm
- **Docker**: 29.3.0
- **PostgreSQL**: 16
- **Redis**: 7
- **ElectricSQL**: latest
- **ClickHouse**: 25.8 (external, shared)

## Description

Runs triggered via the API are accepted and enqueued in Redis, but the docker-provider never processes them. Runs remain in `PENDING` status indefinitely. Development mode works (via `npx trigger.dev dev`), but Production deployed execution does not.

## What works

- Webapp: running, healthy, API responds correctly
- Coordinator: connects via WebSocket, receives DYNAMIC_CONFIG ✓
- Docker-provider: connects via WebSocket, receives SERVER_READY + PRE_PULL_DEPLOYMENT ✓
- ElectricSQL: running, replication active ✓
- Deployments: `DEPLOYED` status in Production (ic3etnvx, 20260326.2, 2 tasks) ✓
- Dev mode: health-check task runs COMPLETED_SUCCESSFULLY in Development ✓
- Registry: docker login works, credentials configured ✓
- API trigger: returns run ID successfully ✓

## What doesn't work

- Runs in Production stay `PENDING` forever
- Docker-provider shows zero activity after `SERVER_READY` (no pull, spawn, create, task, or run logs)
- Zero ephemeral task containers are ever created
- `TaskRunAttempt` table: 0 rows (no attempts ever recorded)

## Detailed investigation

### 1. Redis queues have the messages

```
engine:runqueue:workerQueue:cmn40rrgz0005qu1rihgeecsx-default → 3 messages (list type)
engine:runqueue:{org:...}:message:cmn7oqa7100011rqiy79ocv3w
engine:runqueue:{org:...}:message:cmn7nzcrp00001rqi88hoff19
engine:runqueue:{org:...}:message:cmn7nhm0v00091robjnlq74fg
```

Messages are correctly enqueued but never dequeued.

### 2. SharedQueueConsumer reports no messages

The webapp logs show:
```json
{"reasonStats":{"no_message_dequeued":10},"actionStats":{},"outcomeStats":{"noop":10}}
```

The consumers iterate but find nothing to dequeue, despite messages existing in the workerQueue.

### 3. WorkerInstanceGroup was manually created

The `WorkerInstanceGroup` table was empty (0 rows). We manually created:
```sql
INSERT INTO "WorkerInstanceGroup" (id, type, name, masterQueue, hidden, tokenId, organizationId, projectId, ...)
VALUES ('...', 'MANAGED', 'default', '<projectId>-default', false, '<tokenId>', '<orgId>', '<projectId>', ...);

UPDATE "Project" SET "defaultWorkerGroupId" = '<groupId>' WHERE id = '<projectId>';
```

After this fix, the API stopped returning `"No worker group found"` and started accepting runs. But runs still don't execute.

### 4. Environment vars added to provider/coordinator

Initially, docker-provider and coordinator were missing `DATABASE_URL`, `REDIS_HOST`, `REDIS_PORT`, `REDIS_PASSWORD`. We added them (matching the webapp's values). No change in behavior.

### 5. Docker-provider logs (complete from startup)

```
new zod socket → ws://trigger-webapp:3030/provider
new zod socket → ws://trigger-webapp:3030/shared-queue
Initializing task operations
server listening on port 8809
connect (socket-provider) ✓
connect (socket-shared-queue) ✓
Incoming event SERVER_READY ✓
No checkpoint support: Please enable docker experimental features.
Simulation mode enabled. Containers will be paused, not checkpointed.
```

After this: **complete silence**. No dequeue, no pull, no spawn, no task activity.

### 6. Coordinator logs (complete from startup)

```
Docker mode
connecting → ws://trigger-webapp:3030/coordinator
server listening on port 9020
connect (socket-coordinator) ✓
Incoming event DYNAMIC_CONFIG ✓
Handling DYNAMIC_CONFIG (version v1, checkpointThresholdInMs 30000)
No checkpoint support: Please enable docker experimental features.
Simulation mode enabled.
```

After this: only healthcheck `/health` requests. Zero run-related activity.

## Questions

1. Is the `WorkerInstanceGroup` supposed to be created automatically? In our self-hosted setup, both `WorkerInstanceGroup` and `WorkerGroupToken` tables were empty after initial deployment. The Regions page shows "Default worker instance group not found" with no option to create one.

2. What triggers the docker-provider to dequeue and process runs? It receives `SERVER_READY` but never seems to poll or receive run assignments.

3. Is the `SharedQueueConsumer` in the webapp supposed to read from `engine:runqueue:workerQueue:*` and forward to the provider? It reports `no_message_dequeued` despite messages existing in the queue.

4. Is there a missing env var or configuration step for self-hosted Production execution that isn't in the template? The official docker-compose.yml and .env.example don't mention anything about worker groups.

## Compose structure

Using the official template structure adapted for Docker Swarm:
- webapp (ghcr.io/triggerdotdev/trigger.dev:v4.4.3)
- postgres (16)
- redis (7)
- electric (latest)
- docker-provider (ghcr.io/triggerdotdev/provider/docker:v4.4.3)
- coordinator (ghcr.io/triggerdotdev/coordinator:v4.4.3)

All on the same overlay network. Communication between services verified (HTTP + WebSocket).

## Environment variables (provider)

```
PLATFORM_HOST=trigger-webapp
PLATFORM_WS_PORT=3030
SECURE_CONNECTION=false
PLATFORM_SECRET=<set>
COORDINATOR_HOST=trigger-coordinator
COORDINATOR_PORT=9020
REGISTRY_HOST=registry.junior.pro
REGISTRY_NAMESPACE=dev/utmlab/trigger
REGISTRY_USERNAME=<set>
REGISTRY_PASSWORD=<set>
DATABASE_URL=postgresql://trigger:<pass>@trigger-postgres:5432/trigger
REDIS_HOST=trigger-redis
REDIS_PORT=6379
REDIS_PASSWORD=<set>
NODE_ENV=production
V3_ENABLED=true
RUNTIME_PLATFORM=docker-compose
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Self-hosted v4.4.3: docker-provider never processes runs (PENDING indefinitely) #3279

Self-hosted v4.4.3: docker-provider never processes runs (PENDING indefinitely)

Environment

Description

What works

What doesn't work

Detailed investigation

1. Redis queues have the messages

2. SharedQueueConsumer reports no messages

3. WorkerInstanceGroup was manually created

4. Environment vars added to provider/coordinator

5. Docker-provider logs (complete from startup)

6. Coordinator logs (complete from startup)

Questions

Compose structure

Environment variables (provider)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Self-hosted v4.4.3: docker-provider never processes runs (PENDING indefinitely) #3279

Description

Self-hosted v4.4.3: docker-provider never processes runs (PENDING indefinitely)

Environment

Description

What works

What doesn't work

Detailed investigation

1. Redis queues have the messages

2. SharedQueueConsumer reports no messages

3. WorkerInstanceGroup was manually created

4. Environment vars added to provider/coordinator

5. Docker-provider logs (complete from startup)

6. Coordinator logs (complete from startup)

Questions

Compose structure

Environment variables (provider)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions