Skip to content

[FIX] recompute_fields: get ids in batches #303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jjmaksoud
Copy link
Contributor

If no ids are given to recompute, the util will fetch all ids in the target table and then recompute in chunks. Fetching all the ids itself can cause a memory error if the table is too large.
Using a named cursor with a limit of 1M records to fetch can eliminate this possibility.

opw-4909458
upg-3106002

@robodoo
Copy link
Contributor

robodoo commented Aug 21, 2025

Pull request status dashboard

@jjmaksoud
Copy link
Contributor Author

upgradeci retry with always only account

@KangOl
Copy link
Contributor

KangOl commented Aug 21, 2025

This has the disadvantage of splitting the log_progress.

I would keep one loop

if ids is None:
    cr.execute("SELECT COUNT(id) FROM ...")
    [count] = cr.fetchone()
else:
    count = len(ids)

if strategy == "auto":
    big_table = count > BIG_TABLE_THRESHOLD
    ...

size = (count + chunk_size - 1) / chunk_size

def get_ids():
    if ids is not None:
        for id_ in ids:
            yield id_
    MAX_SIZE = 1000000
    with named_cursor(cr, MAX_SIZE) as ncr:
        ncr.execute("SELECT id FROM ...")
        for (id_,) in ncr:
            yield id_

for subids in log_progress(chunks(get_ids(), chunk_size, list), logger, qualifier=qual, size=size):
    ...

@jjmaksoud jjmaksoud force-pushed the master-recompute-memory-maji branch from 86b8e00 to a82fb9d Compare August 21, 2025 13:19
@jjmaksoud
Copy link
Contributor Author

all comments applied

If no ids are given to recompute, the util will fetch
all ids in the target table and then recompute in chunks.
Fetching all the ids itself can cause a memory error
if the table is too large.
Using a named cursor with a limit of 1M records to fetch
can eliminate this possibility.
@jjmaksoud jjmaksoud force-pushed the master-recompute-memory-maji branch from a82fb9d to 321eba9 Compare August 21, 2025 13:41
Copy link
Contributor

@aj-fuentes aj-fuentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@KangOl KangOl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robodoo r+ priority

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants