Skip to content

High resource usage problem. Memory leak #3782

@perrfect

Description

@perrfect

Describe the bug
I have the physical server without virtualization and run Aleph via docker compose.
Server resources:
RAM 512 Gb
CPU 128
LVM 10 Tb

I'm trying to add files to the server (near by 40 Gb). After some time the upload process crashes, because the server used all resources.
Via top I see the process soffice.bin which uses all my ram.

To Reproduce

  1. Run Aleph via docker compose.
  2. Try to upload different types of data (xls, xlsx, pdf, zip etc.)

Expected behavior
The upload process should complete successfully and soffice.bin should not use all server resources.

Aleph version
3.15.0

Additional context

The docker-compose.yml file

version: "3.2"

services:

  postgres:
    image: postgres:10.0
    env_file: ./aleph.env
    command: postgres -c 'max_connections=2000'
    ports:
      - "5432:5432"
    volumes:
      - postgres-data:/var/lib/postgresql/data
    restart: always

  elasticsearch:
    image: ghcr.io/alephdata/aleph-elasticsearch:3bb5dbed97cfdb9955324d11e5c623a5c5bbc410
    hostname: elasticsearch
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=false
      - "ES_JAVA_OPTS=-Xms16g -Xmx16g"
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    restart: always
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65535
        hard: 65535

  redis:
    image: redis:alpine
    restart: always
    command: [ "redis-server", "--save", "3600", "10" ]
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

  ingest-file:
    image: ghcr.io/alephdata/ingest-file:3.19.1
    restart: always
    tmpfs:
      - /tmp:mode=777
    volumes:
      - archive-data:/data
    depends_on:
      - postgres
      - redis
    env_file: ./aleph.env

  worker:
    image: ghcr.io/alephdata/aleph:${ALEPH_TAG:-3.15.0}
    restart: always
    command:
      - /bin/bash
      - -c
      - |
        aleph upgrade
        aleph worker
    depends_on:
      - postgres
      - elasticsearch
      - redis
      - ingest-file
    tmpfs:
      - /tmp
    volumes:
      - archive-data:/data
    env_file: ./aleph.env

  shell:
    image: ghcr.io/alephdata/aleph:${ALEPH_TAG:-3.15.0}
    command: /bin/bash
    depends_on:
      - postgres
      - elasticsearch
      - redis
      - ingest-file
      - worker
    tmpfs:
      - /tmp
    volumes:
      - archive-data:/data
      - "./mappings:/aleph/mappings"
      - "~:/host"
    env_file: ./aleph.env

  api:
    image: ghcr.io/alephdata/aleph:${ALEPH_TAG:-3.15.0}
    restart: always
    expose:
      - 8000
    ports:
      - "8000:8000"
    depends_on:
      - postgres
      - elasticsearch
      - redis
      - worker
      - ingest-file
    tmpfs:
      - /tmp
    volumes:
      - archive-data:/data
    env_file: ./aleph.env

  ui:
    image: ghcr.io/alephdata/aleph-ui-production:${ALEPH_TAG:-3.15.0}
    restart: always
    depends_on:
      - api
    ports:
      - "8080:8080"

volumes:
  archive-data: {}
  postgres-data: {}
  redis-data: {}
  elasticsearch-data: {}

aleph.env file


# Aleph environment configuration
#
# This file is loaded by docker-compose and transformed into a set of
# environment variables inside the containers. These are, in turn, parsed
# by aleph and used to configure the system.

POSTGRES_USER=test
POSTGRES_PASSWORD=test
POSTGRES_DATABASE=test

# Random string:
ALEPH_SECRET_KEY=some_secret

# Visible instance name in the UI
ALEPH_APP_TITLE=Aleph
ALEPH_APP_NAME=aleph
ALEPH_UI_URL=http://10.10.10.10:8080/

[email protected]

ALEPH_SINGLE_USER=false

ARCHIVE_TYPE=s3
ARCHIVE_BUCKET=aleph
AWS_ACCESS_KEY_ID=some_key_id
AWS_SECRET_ACCESS_KEY=some_secret_key
ARCHIVE_ENDPOINT_URL=http://10.10.10.20:9000
AWS_SECURE=false

ELASTICSEARCH_TLS_VERIFY_CERTS=0

ALEPH_OCR_DEFAULTS=eng

ALEPH_DEBUG=true

LOG_FORMAT=JSON  # TEXT or JSON

PROMETHEUS_ENABLED=true
PROMETHEUS_MULTIPROC_DIR=/data

WORKER_THREADS=0

Metadata

Metadata

Assignees

Labels

supportTo track support requests

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions