[FIX] recompute_fields: get ids in batches #303

jjmaksoud · 2025-08-21T12:05:15Z

If no ids are given to recompute, the util will fetch all ids in the target table and then recompute in chunks. Fetching all the ids itself can cause a memory error if the table is too large.
Using a named cursor with a limit of 1M records to fetch can eliminate this possibility.

opw-4909458
upg-3106002

robodoo · 2025-08-21T12:05:18Z

jjmaksoud · 2025-08-21T12:06:04Z

upgradeci retry with always only account

src/util/orm.py

KangOl · 2025-08-21T12:57:51Z

This has the disadvantage of splitting the log_progress.

I would keep one loop

if ids is None:
    cr.execute("SELECT COUNT(id) FROM ...")
    [count] = cr.fetchone()
else:
    count = len(ids)

if strategy == "auto":
    big_table = count > BIG_TABLE_THRESHOLD
    ...

size = (count + chunk_size - 1) / chunk_size

def get_ids():
    if ids is not None:
        for id_ in ids:
            yield id_
    MAX_SIZE = 1000000
    with named_cursor(cr, MAX_SIZE) as ncr:
        ncr.execute("SELECT id FROM ...")
        for (id_,) in ncr:
            yield id_

for subids in log_progress(chunks(get_ids(), chunk_size, list), logger, qualifier=qual, size=size):
    ...

src/util/orm.py

jjmaksoud · 2025-08-21T13:24:26Z

all comments applied

src/util/orm.py

If no ids are given to recompute, the util will fetch all ids in the target table and then recompute in chunks. Fetching all the ids itself can cause a memory error if the table is too large. Using a named cursor with a limit of 1M records to fetch can eliminate this possibility.

aj-fuentes

LGTM

KangOl

@robodoo r+ priority

aj-fuentes reviewed Aug 21, 2025

View reviewed changes