-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround overflow errors with Numpy ≥ 2. #128
base: develop
Are you sure you want to change the base?
Conversation
This change works around OverflowError regressions caught by the test suite with Numpy 2 and later, independently reported in [issue PacificBiosciences#127] and in [Debian bug #1095090], by restoring the former (potentially buggy) behavior with uncaught overflows. Issues manifested like: File "/home/shuaiw/miniconda3/envs/methy/lib/python3.12/site-packages/pbcore/io/align/_BamSupport.py", line 66, in rgAsInt return np.int32(int(rgIdString.split("/")[0], 16)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OverflowError: Python integer 3367457666 out of bounds for int32 and like: def _gapify(self, data, orientation, gapOp): if self.isUnmapped: return data # Precondition: data must already be *in* the specified orientation if data.dtype == np.int8: gapCode = ord("-") else: > gapCode = data.dtype.type(-1) E OverflowError: Python integer -1 out of bounds for uint8 While being at it, the change includes a fix proposal for situations where identifiers include hyphenations, fixing: return np.int32(int(rgIdString.split("/")[0], 16)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: invalid literal for int() with base 16: 'cb4d472d-100C60F6' [issue PacificBiosciences#127]: PacificBiosciences#127 [Debian bug #1095090]: https://bugs.debian.org/1095090 Signed-Off-By: Étienne Mollier <[email protected]>
Thanks for taking a stab at fixing this - I'll investigate it on our end. |
@emollier Actually I'm a little surprised that you were able to trigger this bug, since |
Hi Nat,
Thanks for the status about upcoming fixes! I must admit I was
reluctant to send the patch at all given the nature of my fix,
and I'm glad there exists something probably more appropriate in
the pipeline.
About how I hit the bug despite the numpy version specification:
this was initially Debian bug #1095090[1] affecting the pbcore
module. Since we ship the operating system as a consistent
whole, there is only ever one version of a given Python module
that is available under the form of a Debian package, and that
includes numpy. Thus, when we moved to numpy 2 in unstable,
this triggered a campaign of porting all existing packaged
Python modules depending on numpy to make them compatible with
the newer version that introduced breaking changes. At no point
in the packaging step is involved pip, which would have
otherwise handled the multiple modules versions.
In hope this clarifies things,
Have a nice day, :)
Étienne.
[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1095090
|
Oh, nevermind that:
Thanks for the status about upcoming fixes! I must admit I was
reluctant to send the patch at all given the nature of my fix,
and I'm glad there exists something probably more appropriate in
the pipeline.
I just saw your edited comment about the fix.
|
pbcore/io/align/_BamSupport.py
Outdated
@@ -67,7 +68,8 @@ def reverseComplementAscii(a): | |||
# qId calculation from RG ID string | |||
# | |||
def rgAsInt(rgIdString): | |||
return np.int32(int(rgIdString.split("/")[0], 16)) | |||
return np.int32(int(re.sub("-", "", rgIdString.split("/")[0]), 16) | |||
% (np.iinfo(np.int32).max+1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the modulus will give the intended result, which in the case of overflow will be a negative number. Based on my quick experimenting with one of the failing unit tests, if it overflows I think you need to take the modulus and then subtract np.iinfo(np.int32).max+1
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to scratch my head, but I think I see your point. Legacy behavior is: the overflow wraps around at np.iinfo(np.int32).min. Meanwhile my implementation is: wrap around at 0 straight. To be fully legacy compatible, my modulus wraparound is incorrect and I need to compensate so the index wraps between np.iinfo(np.int32).min and np.iinfo(np.int32).max instead of 0 to np.iinfo(np.int32).max. Do I understand correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a commit, this may clear things. Looks like I botched my first attempt. I pushed a hotfix but am still running tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my mess, the commit should be in the state I intend now. I have verified the test suite passed.
The initial modulus resulted in values ranging from 0 to int32 max, but a correct implementation mimicking the legacy should range from int32 min to int32 max, preferrably without altering fitting values already within the range. Signed-Off-By: Étienne Mollier <[email protected]>
This change works around OverflowError regressions caught by the test suite with Numpy 2 and later, independently reported in issue #127 and in Debian bug #1095090, by restoring the former (potentially buggy) behavior with uncaught overflows. Issues manifested like:
and like:
While being at it, the change includes a fix proposal for situations where identifiers include hyphenations, fixing: