PixelProbe Configuration Guide

Complete reference for all PixelProbe configuration options, environment variables, and performance tuning.

Environment Variables
Docker Compose Configuration
Database Configuration
Performance Tuning
Scanning Configuration
Celery Configuration
Data Retention Configuration
Security Configuration
Resource Recommendations

Environment Variables

All configuration is done via environment variables, either in .env file or directly in docker-compose.yml.

Required Variables

Variable	Description	Example
`SECRET_KEY`	Flask session secret key (64 chars)	Generate with: `python -c "import secrets; print(secrets.token_hex(32))"`
`POSTGRES_PASSWORD`	PostgreSQL database password	`your-secure-password`
`MEDIA_PATH`	Host path to media files (Docker only)	`/mnt/media`

Database Variables

Variable	Default	Description
`POSTGRES_HOST`	`localhost`	PostgreSQL server hostname
`POSTGRES_PORT`	`5432`	PostgreSQL server port
`POSTGRES_DB`	`pixelprobe`	Database name
`POSTGRES_USER`	`pixelprobe`	Database username
`POSTGRES_PASSWORD`	(required)	Database password
`DATABASE_ECHO`	`false`	Enable SQL query logging (debug)

Application Variables

Variable	Default	Description
`FLASK_ENV`	`production`	Flask environment (`production`, `development`, `testing`)
`SCAN_PATHS`	`/media`	Comma-separated paths to scan inside container
`TZ`	`UTC`	Timezone for timestamps (e.g., `America/New_York`)
`PORT`	`5000`	Web interface port

Performance Variables

Variable	Default	Description	Recommendations
`MAX_WORKERS`	`10`	Parallel file scanning workers per task	10-24 for most systems
`BATCH_SIZE`	`100`	Files per batch during discovery	50-200 based on file sizes
`MAX_OUTPUT_SIZE`	`10000`	Max output characters before rotation	10000-50000
`OUTPUT_ROTATION_ENABLED`	`true`	Enable output truncation	`true` for large scans
`FREEZE_DETECTION_ENABLED`	`true`	Enable video freeze detection (freezedetect + blackdetect)	`false` to skip and reduce scan time

Performance Notes:

MAX_WORKERS controls parallelism within each scan task
Each worker creates 1 database connection
Total connections = 60 (main app pool) + MAX_WORKERS
Keep under PostgreSQL max_connections (default: 100)

Celery Variables

Variable	Default	Description
`CELERY_BROKER_URL`	`redis://localhost:6379/0`	Redis URL for task queue
`CELERY_RESULT_BACKEND`	`redis://localhost:6379/0`	Redis URL for results
`CELERY_CONCURRENCY`	`4`	Number of concurrent Celery tasks
`CELERY_LOG_LEVEL`	`INFO`	Celery log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)

Celery Notes:

CELERY_CONCURRENCY controls how many scan tasks run simultaneously
Independent from MAX_WORKERS (which controls parallelism within each task)
Recommended: 4-8 for most systems

Redis Variables

Variable	Default	Description
`REDIS_MAX_MEMORY`	`2gb`	Maximum Redis memory for task queue

Redis Notes:

For large libraries (1M+ files), increase to 4gb
Redis stores task queue and results temporarily
Uses noeviction policy to prevent task loss

Scanning Variables

Variable	Default	Description
`EXCLUDED_PATHS`	(empty)	Comma-separated paths to exclude from scanning
`EXCLUDED_EXTENSIONS`	`.txt,.log,.md`	Comma-separated file extensions to exclude
`PERIODIC_SCAN_SCHEDULE`	(empty)	Automated scan schedule (cron or interval format)
`CLEANUP_SCHEDULE`	(empty)	Automated cleanup schedule (cron or interval format)

Schedule Format Examples:

# Cron format (standard cron syntax)
PERIODIC_SCAN_SCHEDULE=cron:0 2 * * *        # Daily at 2 AM
CLEANUP_SCHEDULE=cron:0 3 * * 0              # Weekly on Sunday at 3 AM

# Interval format
PERIODIC_SCAN_SCHEDULE=interval:hours:6      # Every 6 hours
CLEANUP_SCHEDULE=interval:days:7             # Every 7 days

Data Retention Variables

Variable	Default	Description
`SCAN_OUTPUT_RETENTION_DAYS`	`30`	Days before archiving scan outputs (currently disabled)
`REPORT_RETENTION_DAYS`	`90`	Days before deleting old reports
`SCAN_STATE_RETENTION_DAYS`	`7`	Days before deleting completed scan states
`LOG_RETENTION_DAYS`	`30`	Days before deleting old log entries (configurable via UI)

Data Retention Notes:

Automated cleanup runs daily via Celery Beat
SCAN_OUTPUT_RETENTION_DAYS is currently not used (scan results kept forever)
Configurable via environment variables for future flexibility
LOG_RETENTION_DAYS default is stored in the app_configs database table and can be changed via the UI (System > View Logs) or API (PUT /api/logs/retention)

Monitoring Variables (Future)

Variable	Default	Description
`ENABLE_MONITORING`	`false`	Enable Prometheus metrics endpoint
`METRICS_PORT`	`9090`	Metrics endpoint port

Docker Compose Configuration

Basic Configuration

Minimal docker-compose.yml for production:

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    container_name: pixelprobe-postgres
    environment:
      POSTGRES_DB: pixelprobe
      POSTGRES_USER: pixelprobe
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U pixelprobe"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    container_name: pixelprobe-redis
    command: redis-server --maxmemory ${REDIS_MAX_MEMORY:-2gb} --maxmemory-policy noeviction
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  pixelprobe:
    image: ttlequals0/pixelprobe:latest
    container_name: pixelprobe-app
    environment:
      SECRET_KEY: ${SECRET_KEY}
      POSTGRES_HOST: postgres
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      CELERY_BROKER_URL: redis://redis:6379/0
      CELERY_RESULT_BACKEND: redis://redis:6379/0
      SCAN_PATHS: ${SCAN_PATHS:-/media}
      MAX_WORKERS: ${MAX_WORKERS:-10}
      TZ: ${TZ:-UTC}
    volumes:
      - ${MEDIA_PATH}:/media:ro
    ports:
      - "${PORT:-5000}:5000"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

  celery-worker:
    image: ttlequals0/pixelprobe:latest
    container_name: pixelprobe-celery-worker
    command: python celery_worker.py
    environment:
      CELERY_BROKER_URL: redis://redis:6379/0
      CELERY_RESULT_BACKEND: redis://redis:6379/0
      POSTGRES_HOST: postgres
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      SECRET_KEY: ${SECRET_KEY}
      MAX_WORKERS: ${MAX_WORKERS:-10}
      CELERY_CONCURRENCY: ${CELERY_CONCURRENCY:-4}
    volumes:
      - ${MEDIA_PATH}:/media:ro
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

volumes:
  postgres_data:

Multiple Scan Paths

To scan multiple directories:

Method 1: Multiple Volume Mounts

pixelprobe:
  environment:
    SCAN_PATHS: /movies,/tv-shows,/photos
  volumes:
    - /mnt/movies:/movies:ro
    - /mnt/tv-shows:/tv-shows:ro
    - /mnt/photos:/photos:ro

celery-worker:
  environment:
    SCAN_PATHS: /movies,/tv-shows,/photos
  volumes:
    - /mnt/movies:/movies:ro
    - /mnt/tv-shows:/tv-shows:ro
    - /mnt/photos:/photos:ro

Method 2: Single Parent Volume

pixelprobe:
  environment:
    SCAN_PATHS: /media/movies,/media/tv,/media/photos
  volumes:
    - /mnt/all-media:/media:ro

User Permissions (Important)

Both pixelprobe and celery-worker MUST run as the same user to access media files:

pixelprobe:
  user: "${PUID:-1000}:${PGID:-1000}"
  volumes:
    - ${MEDIA_PATH}:/media:ro

celery-worker:
  user: "${PUID:-1000}:${PGID:-1000}"  # MUST match pixelprobe
  volumes:
    - ${MEDIA_PATH}:/media:ro

Find your UID/GID:

id -u  # Shows UID (typically 1000)
id -g  # Shows GID (typically 1000)

Database Configuration

Connection Pool Settings

Configured in config.py:

SQLALCHEMY_ENGINE_OPTIONS = {
    'pool_size': 20,           # Base connection pool size
    'pool_pre_ping': True,     # Test connections before use
    'pool_recycle': 3600,      # Recycle connections after 1 hour
    'max_overflow': 40,        # Additional connections when pool exhausted
    'pool_timeout': 30,        # Timeout waiting for connection
}

Total Connections: 20 (base) + 40 (overflow) + MAX_WORKERS = 60 + MAX_WORKERS

PostgreSQL max_connections:

Default: 100 connections
Recommended: 150+ for production
Set in PostgreSQL: ALTER SYSTEM SET max_connections = 150;

Database Performance

For PostgreSQL optimization:

-- Increase shared buffers (25% of RAM)
ALTER SYSTEM SET shared_buffers = '2GB';

-- Increase work memory for sorts
ALTER SYSTEM SET work_mem = '16MB';

-- Enable parallel queries
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;

-- Restart PostgreSQL
SELECT pg_reload_conf();

Performance Tuning

Recommended Settings by System Size

Small Library (< 10,000 files)

MAX_WORKERS=4
CELERY_CONCURRENCY=2
BATCH_SIZE=50
REDIS_MAX_MEMORY=512mb

Medium Library (10,000 - 100,000 files)

MAX_WORKERS=10
CELERY_CONCURRENCY=4
BATCH_SIZE=100
REDIS_MAX_MEMORY=1gb

Large Library (100,000 - 1,000,000 files)

MAX_WORKERS=16
CELERY_CONCURRENCY=6
BATCH_SIZE=200
REDIS_MAX_MEMORY=2gb

Extra Large Library (1,000,000+ files)

MAX_WORKERS=24
CELERY_CONCURRENCY=8
BATCH_SIZE=200
REDIS_MAX_MEMORY=4gb

Resource Allocation

Docker resource limits for large libraries:

celery-worker:
  deploy:
    resources:
      limits:
        cpus: '8'          # Limit CPU cores
        memory: 8G         # Limit RAM
      reservations:
        cpus: '4'          # Guaranteed CPU cores
        memory: 4G         # Guaranteed RAM

Storage Performance

For best performance:

Database Storage: SSD strongly recommended
Media Storage: Can be HDD, but SSD improves scan speed
Temp Files: Use tmpfs for temporary files (optional)

pixelprobe:
  volumes:
    - /mnt/ssd/postgres_data:/var/lib/postgresql/data  # SSD for database
    - /mnt/hdd/media:/media:ro                          # HDD OK for media
  tmpfs:
    - /tmp:size=1G                                      # tmpfs for temp files

Scanning Configuration

Exclusion Configuration

Exclude specific paths or file types from scanning:

Via Environment Variables:

EXCLUDED_PATHS=/media/temp,/media/incomplete,/media/.cache
EXCLUDED_EXTENSIONS=.tmp,.partial,.!qB,.part,.crdownload

Via Web Interface:

Navigate to Tools > Exclusions
Add paths or extensions
Click Save

Schedule Configuration

Schedule automated scans:

Via Environment Variables:

# Daily full scan at 2 AM
PERIODIC_SCAN_SCHEDULE=cron:0 2 * * *

# Weekly cleanup on Sunday at 3 AM
CLEANUP_SCHEDULE=cron:0 3 * * 0

Via Web Interface:

Navigate to Tools > Schedules
Click "Create Schedule"
Configure schedule type, frequency, and scan type
Click Save

Celery Configuration

Worker Concurrency

Number of concurrent scan tasks:

# Low concurrency (memory-constrained systems)
CELERY_CONCURRENCY=2

# Medium concurrency (typical systems)
CELERY_CONCURRENCY=4

# High concurrency (powerful systems)
CELERY_CONCURRENCY=8

Task Prioritization

Celery queues and priorities are automatically configured:

Default Queue: Normal scans (priority 5)
Integrity Queue: Integrity checks (priority 7)
Cleanup Queue: Database cleanup (priority 6)
Retention Queue: Data retention (priority 9)

Beat Schedule (Automated Tasks)

Celery Beat runs scheduled tasks daily:

# Runs at 2 AM daily
'data-retention-cleanup': {
    'task': 'pixelprobe.tasks.run_retention_cleanup',
    'schedule': crontab(hour=2, minute=0),
    'options': {'queue': 'retention', 'priority': 9}
}

Data Retention Configuration

Retention Policies

Configure how long data is retained:

# Archive scan outputs after 30 days (currently disabled)
SCAN_OUTPUT_RETENTION_DAYS=30

# Delete old reports after 90 days
REPORT_RETENTION_DAYS=90

# Delete completed scan states after 7 days
SCAN_STATE_RETENTION_DAYS=7

Manual Cleanup

Run data retention manually:

# Docker
docker exec pixelprobe-app python tools/data_retention.py

# Manual installation
python tools/data_retention.py

Preview what would be cleaned:

python tools/data_retention.py --dry-run

Security Configuration

SSRF Trusted Hosts

PixelProbe includes SSRF protection that blocks outbound requests to private/reserved IP ranges. If you use internal services for healthchecks, notifications (ntfy, webhooks), or similar integrations that resolve to private IPs, you can allowlist them:

Variable	Default	Description
`TRUSTED_INTERNAL_HOSTS`	(empty)	Comma-separated hostnames and/or CIDR ranges that bypass SSRF private-IP blocking

Examples:

# Single hostname
TRUSTED_INTERNAL_HOSTS=healthcheck.internal.local

# Hostname + subnet
TRUSTED_INTERNAL_HOSTS=healthcheck.internal.local,192.168.5.0/24

# Multiple entries
TRUSTED_INTERNAL_HOSTS=healthcheck.internal.local,ntfy.internal.local,10.0.0.0/8

Notes:

Hostname matching is case-insensitive
CIDR ranges apply to resolved IPs regardless of hostname
A bare IP (e.g., 10.0.0.5) is treated as a /32 single-host range
Must be set in both pixelprobe and celery-worker containers (or via shared .env)
Public IPs are always allowed; this setting only affects private/reserved ranges

Secret Key Generation

Generate a secure secret key:

python3 -c "import secrets; print(secrets.token_hex(32))"

Copy the output to SECRET_KEY in .env.

Session Configuration

Session settings are configured in Flask:

# Session timeout (default: 30 days)
PERMANENT_SESSION_LIFETIME = timedelta(days=30)

# Session cookie settings
SESSION_COOKIE_SECURE = True   # HTTPS only (production)
SESSION_COOKIE_HTTPONLY = True # Prevent JavaScript access
SESSION_COOKIE_SAMESITE = 'Lax' # CSRF protection

API Token Authentication

Users can generate API tokens via:

Web UI: Account > API Tokens
API: POST /api/auth/tokens

Tokens support optional expiration dates.

Resource Recommendations

CPU Requirements

Library Size	Minimum CPUs	Recommended CPUs
< 10K files	2 cores	4 cores
10K-100K files	4 cores	8 cores
100K-1M files	8 cores	16 cores
1M+ files	16 cores	32 cores

Memory Requirements

Library Size	Minimum RAM	Recommended RAM
< 10K files	2 GB	4 GB
10K-100K files	4 GB	8 GB
100K-1M files	8 GB	16 GB
1M+ files	16 GB	32 GB

Disk Requirements

Database: 100 MB per 10,000 files (estimated)
Logs: 1-10 GB (depending on retention)
Reports: 100 MB per 1,000 reports
Temp Files: 1-2 GB during scans

Network Requirements

Bandwidth: Minimal (local file access)
Latency: Low latency to database required
Ports: 5000 (web), 5432 (postgres), 6379 (redis)

Configuration Best Practices

Start Conservative: Begin with default settings and increase gradually
Monitor Resources: Use docker stats to monitor CPU/memory usage
Test Changes: Test configuration changes on a subset of files first
Document Settings: Keep notes on what works for your environment
Regular Backups: Backup database and configuration regularly
Security First: Use strong passwords and keep SECRET_KEY secure
Update Regularly: Pull latest images for bug fixes and improvements

Examples

Home Media Server (20K files)

MAX_WORKERS=8
CELERY_CONCURRENCY=3
BATCH_SIZE=100
REDIS_MAX_MEMORY=1gb
POSTGRES_PASSWORD=strong-password-here
SCAN_PATHS=/movies,/tv

Professional Archive (500K files)

MAX_WORKERS=20
CELERY_CONCURRENCY=6
BATCH_SIZE=200
REDIS_MAX_MEMORY=4gb
POSTGRES_PASSWORD=very-strong-password
SCAN_PATHS=/archive/video,/archive/images
OUTPUT_ROTATION_ENABLED=true
MAX_OUTPUT_SIZE=50000

Multi-User Production (2M files)

MAX_WORKERS=24
CELERY_CONCURRENCY=8
BATCH_SIZE=200
REDIS_MAX_MEMORY=8gb
POSTGRES_PASSWORD=enterprise-strength-password
SCAN_PATHS=/storage/media1,/storage/media2,/storage/media3
ENABLE_MONITORING=true

Troubleshooting Configuration

See TROUBLESHOOTING.md for solutions to common configuration issues.

FilesExpand file tree

CONFIGURATION.md

Latest commit

History