Skip to content

Improve convert retry handling #3904

@stchris

Description

@stchris

Our current retry logic for converting documents (shelling out to LibreOffice) is based on two constants: the number of retry attempts and the timeout https://github.com/alephdata/ingest-file/blob/fca65fbb08ff37d65df3c14804ad5b1b6809b97d/ingestors/support/convert.py#L16-L17

What would be more desirable is a faster first fail which could be increased to a maximum.

For instance: right now we retry up to 5 times and timeout after 3600s (1 hour). We could potentially get much better throughput by having a first timeout after 600s (10 minutes) which gets progressively larger (with a potential max cap). To illustrate:

TIMEOUT_START=600
TIMEOUT_INCREASE=900
TIMEOUT_MAX=3600
CONVERT_RETRIES=5

This would result in up to 5 retries with timeouts of 10, 25, 40, 55 and 60 minutes. Ideally "stuck" convert tasks would time out much sooner and get queued up for a retry faster.

TODO

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions