-
Notifications
You must be signed in to change notification settings - Fork 324
Description
Our current retry logic for converting documents (shelling out to LibreOffice) is based on two constants: the number of retry attempts and the timeout https://github.com/alephdata/ingest-file/blob/fca65fbb08ff37d65df3c14804ad5b1b6809b97d/ingestors/support/convert.py#L16-L17
What would be more desirable is a faster first fail which could be increased to a maximum.
For instance: right now we retry up to 5 times and timeout after 3600s (1 hour). We could potentially get much better throughput by having a first timeout after 600s (10 minutes) which gets progressively larger (with a potential max cap). To illustrate:
TIMEOUT_START=600
TIMEOUT_INCREASE=900
TIMEOUT_MAX=3600
CONVERT_RETRIES=5This would result in up to 5 retries with timeouts of 10, 25, 40, 55 and 60 minutes. Ideally "stuck" convert tasks would time out much sooner and get queued up for a retry faster.
TODO
- try get some data on average(and maximum?) time it takes to convert a document
- make the timeout and retry settings respect their respective settings.