Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround overflow errors with Numpy ≥ 2. #128

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

emollier
Copy link

This change works around OverflowError regressions caught by the test suite with Numpy 2 and later, independently reported in issue #127 and in Debian bug #1095090, by restoring the former (potentially buggy) behavior with uncaught overflows. Issues manifested like:

  File "/home/shuaiw/miniconda3/envs/methy/lib/python3.12/site-packages/pbcore/io/align/_BamSupport.py", line 66, in rgAsInt
    return np.int32(int(rgIdString.split("/")[0], 16))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer 3367457666 out of bounds for int32

and like:

    def _gapify(self, data, orientation, gapOp):
        if self.isUnmapped:
            return data

        # Precondition: data must already be *in* the specified orientation
        if data.dtype == np.int8:
            gapCode = ord("-")
        else:
>           gapCode = data.dtype.type(-1)
E           OverflowError: Python integer -1 out of bounds for uint8

While being at it, the change includes a fix proposal for situations where identifiers include hyphenations, fixing:

    return np.int32(int(rgIdString.split("/")[0], 16))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 16: 'cb4d472d-100C60F6'

This change works around OverflowError regressions caught by the test
suite with Numpy 2 and later, independently reported in [issue PacificBiosciences#127]
and in [Debian bug #1095090], by restoring the former (potentially
buggy) behavior with uncaught overflows.  Issues manifested like:

	  File "/home/shuaiw/miniconda3/envs/methy/lib/python3.12/site-packages/pbcore/io/align/_BamSupport.py", line 66, in rgAsInt
	    return np.int32(int(rgIdString.split("/")[0], 16))
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	OverflowError: Python integer 3367457666 out of bounds for int32

and like:

	    def _gapify(self, data, orientation, gapOp):
	        if self.isUnmapped:
	            return data

	        # Precondition: data must already be *in* the specified orientation
	        if data.dtype == np.int8:
	            gapCode = ord("-")
	        else:
	>           gapCode = data.dtype.type(-1)
	E           OverflowError: Python integer -1 out of bounds for uint8

While being at it, the change includes a fix proposal for situations
where identifiers include hyphenations, fixing:

	    return np.int32(int(rgIdString.split("/")[0], 16))
	                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	ValueError: invalid literal for int() with base 16: 'cb4d472d-100C60F6'

[issue PacificBiosciences#127]: PacificBiosciences#127
[Debian bug #1095090]: https://bugs.debian.org/1095090

Signed-Off-By: Étienne Mollier <[email protected]>
@natechols
Copy link
Contributor

natechols commented Feb 28, 2025

Thanks for taking a stab at fixing this - I'll investigate it on our end. The identifier bug is fixed upstream, we're just slow to update this mirror. hold on wrong bug, this one is new. I'll take a look at that too.

@natechols
Copy link
Contributor

@emollier Actually I'm a little surprised that you were able to trigger this bug, since setup.py is pinned to a very narrow range of numpy versions (to keep our builds stable). Is there some packaging or installation step in your setup that is allowing the module dependencies to be ignored?

@emollier
Copy link
Author

emollier commented Feb 28, 2025 via email

@emollier
Copy link
Author

emollier commented Feb 28, 2025 via email

@@ -67,7 +68,8 @@ def reverseComplementAscii(a):
# qId calculation from RG ID string
#
def rgAsInt(rgIdString):
return np.int32(int(rgIdString.split("/")[0], 16))
return np.int32(int(re.sub("-", "", rgIdString.split("/")[0]), 16)
% (np.iinfo(np.int32).max+1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the modulus will give the intended result, which in the case of overflow will be a negative number. Based on my quick experimenting with one of the failing unit tests, if it overflows I think you need to take the modulus and then subtract np.iinfo(np.int32).max+1 again.

Copy link
Author

@emollier emollier Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to scratch my head, but I think I see your point. Legacy behavior is: the overflow wraps around at np.iinfo(np.int32).min. Meanwhile my implementation is: wrap around at 0 straight. To be fully legacy compatible, my modulus wraparound is incorrect and I need to compensate so the index wraps between np.iinfo(np.int32).min and np.iinfo(np.int32).max instead of 0 to np.iinfo(np.int32).max. Do I understand correctly?

Copy link
Author

@emollier emollier Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a commit, this may clear things. Looks like I botched my first attempt. I pushed a hotfix but am still running tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my mess, the commit should be in the state I intend now. I have verified the test suite passed.

The initial modulus resulted in values ranging from 0 to int32 max,
but a correct implementation mimicking the legacy should range from
int32 min to int32 max, preferrably without altering fitting values
already within the range.

Signed-Off-By: Étienne Mollier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants