Conversation
| ).substitute(page=page) | ||
| groups_paged = await self.api("get", url, token=token) | ||
| groups += groups_paged | ||
| if len(groups) < 100: |
There was a problem hiding this comment.
Potential bug: Incorrect pagination logic causes infinite loops and data loss when listing GitLab repos.
-
Description: The pagination loop for fetching GitLab groups has a flawed break condition. It checks the total number of accumulated groups (
len(groups)) instead of the number of groups returned on the current page (len(groups_paged)). This can cause an infinite loop if an organization has enough groups to fill multiple pages (e.g., 100 or more), leading to resource exhaustion and a potential service crash. It can also lead to data loss by terminating prematurely if the first page has fewer than 100 groups. -
Suggested fix: The break condition should check the number of items on the current page, not the accumulated total. Change
if len(groups) < 100:toif len(groups_paged) < 100:.
severity: 0.9, confidence: 1.0
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
This handling of pagination logic was taken from else where in the existing GitLab code, and opted for that to stay consistent with existing standards. There are returned pagination headers on the GitLab API responses we should be using across the board instead to handle the pagination properly, but I view that as out of scope for this PR, as we are following existing methods.
There was a problem hiding this comment.
@nathanbrophy there's actually a python gitlab library which abstracts a bunch of this away which could be a useful solution.
There was a problem hiding this comment.
Agreed. I think that would be a good follow on to replace the gitlab client here with the published GitLab sponsored one.
This PR updates the GitLab Enterprise / Community Edition handling of the
list_repocall. This call has a bug in it where the call tolist_repos_get_user_and_groupsin the GitLab API is not properly paginated. This means that any GitLab instances with more than 100 groups will not load properly into Codecov. The result is after a repo sync from the UI, the user still cannot view all the repos for the configured instance, even when using the bot token config. Adding pagination to this call fixes that behavior.Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.