Skip to content

Conversation

VbhvGupta
Copy link

Summary

This change fixes a bug in tarfile.py where negative offset and numbytes values were not properly handled when parsing GNU sparse file headers. The existing validation only checked for non-zero values, allowing negative integers to pass, which could lead to errors or a potential security vulnerability when processing a corrupted or malicious tar archive.

The problem with the original code was twofold:

  • It incorrectly allowed negative numbers, which was the critical issue.
  • It incorrectly disallowed zero, which is a valid value for both offset and numbytes according to the tar specification. (ref)

The fix introduces a direct check to ensure these values are non-negative (>= 0) before they are used, making the parsing logic more robust.

Additional Context

During the discussion in the issue, a more comprehensive refactoring of the module's integer handling and error reporting was suggested. While that is a valid long-term improvement, this pull request provides a focused, immediate, and safe fix for the specific vulnerability identified. The broader refactoring can be considered separately.

@python-cla-bot
Copy link

python-cla-bot bot commented Sep 10, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Sep 10, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an open PR for this issue: #137805. Please review that one instead.

@bedevere-app
Copy link

bedevere-app bot commented Sep 16, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz
Copy link
Member

picnixz commented Sep 16, 2025

This PR ignores negative offsets, while the other rejects them. I would prefer rejecting them when it makes sense (because it is possible to allow them), but, main currently stops upon finding ill-formed ones (it makes a break). I would however prefer one PR for the issue, not two.

Ideally, I would like to raise an error if we encounter an invalid offset rather than breaking the loop and keeping what we read but we should carefully check what the specs say. As such, I'm going to close this one.

@picnixz picnixz closed this Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guard against negative offset/length values in tarfile's GNU sparse extraction
3 participants