Skip to content

SOLVED - Unable to persist CrowdSec Local API key on container restart in Caddy. #3603

@aidenjanzen

Description

@aidenjanzen

Unable to persist CrowdSec Local API key on container restart in Caddy.

I had a lot of trouble with this and have tried multiple solutions over a couple of months. I wanted to post my solution on GitHub if others ran into the same issue.

I am running CrowdSec in a Docker container alongside a Caddy bouncer built from hslatman’s repository built with xCaddy.

Initially, everything works fine:

Generate an API key with sudo docker exec crowdsec cscli bouncer add caddy-bouncer.

Add the key to the Docker Compose and Caddyfile.

Restart Caddy and CrowdSec.

And CrowdSec starts blocking malicious requests to Caddy.

Issue:

However, after performing an update or running:

sudo docker compose down
sudo docker compose up

The API key appears invalid, and the following error occurs:

{"level":"error","ts":1730394008.01895,"logger":"crowdsec","msg":"auth-api: auth with api key failed return nil response, error: dial tcp [::1]:8089: connect: connection refused","instance_id":"a417f79a","address":"http://localhost:8089/","error":"auth-api: auth with api key failed return nil response, error: dial tcp [::1]:8089: connect: connection refused"}
{"level":"error","ts":1730394008.0189993,"logger":"crowdsec","msg":"failed to connect to LAPI, retrying in 10s: Get \"http://localhost:8089/v1/decisions/stream?startup=true\": dial tcp [::1]:8089: connect: connection refused","instance_id":"a417f79a","address":"http://localhost:8089/","error":"failed to connect to LAPI, retrying in 10s: Get \"http://localhost:8089/v1/decisions/stream?startup=true\": dial tcp [::1]:8089: connect: connection refused"}

This shows that the Caddy bouncer cannot connect to the CrowdSec Local API (LAPI) at container start.

As a workaround, I had to manually re-run cscli bouncer add caddy-bouncer, which generates a new API key, then manually update configs, and restart Caddy. This would happen multiple times a week.

The goal was to avoid this manual intervention.

Key Findings:

  • Docker Setup:

Tested fixed IP addresses inside a custom Docker network and specified a subnet for both services. Did not change the LAPI issue.

Made sure that data persistence is correctly set up through volumes:

./crowdsec/crowdsec-db:/var/lib/crowdsec/data/
./crowdsec/crowdsec-config:/etc/crowdsec/
  • Database:

The SQLite database (crowdsec.db) contains persistent bouncer entries across restarts. You can test this with SELECT * FROM the CrowdSec container DB. The output shows that the API key is consistent after container down.

  • Caddyfile:

Caddy was correctly configured to connect via http://crowdsec:8080 using the generated API key. I could tell because everything worked once the API key was reset, CrowdSec and Caddy connected fine and blocked malicious attempts.

  • Testing:

cscli lapi register did not solve the problem.

Root Cause:

The problem is a race condition during startup:

Docker Compose's depends_on only waits for the container, not for the service inside (CrowdSec LAPI) to be ready.

Caddy starts too early, tries to connect to LAPI, fails once, and never retries.

Although the database and the API key are working, the failed connection causes CrowdSec to refuse future authentications for that session.

Solution:

I had to delay Caddy's startup until CrowdSec’s LAPI is fully operational.

This is achieved by:

Adding a healthcheck to the CrowdSec container that waits for LAPI readiness.

Using depends_on: condition: service_healthy for the Caddy service.

Solution Compose:

services:
  crowdsec:
    container_name: crowdsec
    image: crowdsecurity/crowdsec:latest

 # ~~existing crowdsec compose ~~

    healthcheck:
      test:
        - CMD
        - cscli
        - lapi
        - status
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

  caddy:
    build:
      context: .
      dockerfile: ./Dockerfile
    depends_on:
      crowdsec:
        condition: service_healthy
    container_name: caddy

# ~~existing caddy compose ~~

Why this works:

The database and API key always persisted correctly. The real problem was early connection failures, causing invalid authentication sessions.

By enforcing a health-based startup sequence, Caddy waits for LAPI to be ready, solving the issue without touching the key or manually re-registering.

Healthcheck:
test: ["CMD", "cscli", "lapi", "status"]: This command checks if the LAPI is running and responsive.
interval: 10s: Checks every 10 seconds.
timeout: 5s: Fails if the command takes longer than 5 seconds.
retries: 3: Retries 3 times before marking the container as unhealthy.
start_period: 30s: Allows 30 seconds for initial startup before health checks begin, waiting for the time CrowdSec needs to start the LAPI.

Depends On:
condition: service_healthy: The Caddy container waits until the CrowdSec container’s health check passes. This only happens if LAPI is successful.

Further Reading:

One other user with the same issue:
https://forum.hhf.technology/t/unable-to-persist-crowdsec-local-api-key-on-container-restart-in-caddy-stack/

My CrowdSec Discord Support Ticket:
https://discord.com/channels/921520481163673640/1348151360708673687/1348151360708673687

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions