Skip to content

Conversation

@SwayamInSync
Copy link
Member

closes numpy/numpy-quaddtype#26

As per the title

Copy link
Contributor

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor nits addressed

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip any chance you have time and interest to look this over?

@SwayamInSync
Copy link
Member Author

Details for easing the review process

Only casts.cpp and test_quaddtype requires the major review, I added comments wherever I made the changes to give reasoning for that step, the rest files are modified for the caues listed below

Small refactoring

  • quad_value union with __float128 causes lot of ABI & compiler optimization issues in C++, hence refactoring the methods to not perform any kind of copy of this passed union instead use pointers
  • Inter-backend operation is broken in previous builds, not sure why someone wants to go in that direction, but just for the sake of completeness I fixed it here as well, to implement the same_value casting between QuadPrecision with different backends
  • Remove the aligned/unlaigned separate loops with their templated versions

This is still valid, outside this are just tweaks and refactoring to make things work in stable manner

@mattip
Copy link
Member

mattip commented Dec 29, 2025

I can take a look after Jan 11.

@ngoldbaum
Copy link
Member

@SwayamInSync do you mind waiting a couple weeks on this? I'd like to have Matti look because he added same_value casting and also has a try at this in #161.

In the meantime: one thing that would help is to make the PR more atomic. There's a fair amount of renaming and other stuff like that you could do in another PR then rebase this once the other PR is merged to reduce the diff size.

In general I'd much rather review several smaller more focused PRs rather than one big PR.

@juntyr
Copy link
Contributor

juntyr commented Dec 30, 2025

I was hoping for a faster timeline to the release (since numpy-quaddtype is used in my research) but life is life and I'll adapt my own timelines if need be

@SwayamInSync
Copy link
Member Author

@SwayamInSync do you mind waiting a couple weeks on this? I'd like to have Matti look because he added same_value casting and also has a try at this in #161.

In the meantime: one thing that would help is to make the PR more atomic. There's a fair amount of renaming and other stuff like that you could do in another PR then rebase this once the other PR is merged to reduce the diff size.

In general I'd much rather review several smaller more focused PRs rather than one big PR.

Right, I also didn't want to make it this big, but it the design choice of using union with SIMD types in C++ gave a lot of ABI breaks (as __float128 isn't standard so on x86-64 machines, doing careless copies gives heisenbugs)
I can do some independent refactor PRs, and since we are allowing some time window I'll also get some work to do on docs 😄

@SwayamInSync
Copy link
Member Author

Cool so the diff got down now, mostly again covered by casts.cpp and tests

@SwayamInSync
Copy link
Member Author

Hi @mattip gentle ping to get this in your radar.

@juntyr
Copy link
Contributor

juntyr commented Jan 13, 2026

I'm also hoping that we can merge this PR and publish v1.0 soon

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass over the C code and spotted some issues. I'd appreciate it if you could look elsewhere for similar patterns to the main issue I spotted: setting an error in an error path when an error is already set

}

// Helper function for quad-to-quad same_value check (inter-backend)
// NOTE: the inter-backend uses `double` as intermediate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does SLEEF not have any utilities for working with 80 bit floats? It’d be nice to not have this restriction. But also not critical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They does not have any explicit cross-platform utilities for 80-bit floats.
That's why only values perfectly castable from "longdouble" -> "quad" are those that are under the "double" range (because of this intermediate)

If someone still wants the perfect cast then ld -> string -> quad is the route


// Perform same_value check if requested
if (same_value_casting) {
if (quad_to_string_same_value_check(&in_val, str_buf, str_size, backend) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the same value not to be preserved here? I don't think it is - StringDType shouldn't truncate data. That means this check isn't necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes right, my bad
we shouldn't be needing same_value check here.

}

static inline int
quad_to_string_same_value_check(const quad_value *in_val, const char *str_buf, npy_intp str_len,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could avoid allocating a temporary buffer to store a string if you knew how many digits are necessary to represent the quad precisely. Is that exposed by SLEEF?

I ask because this whole process - allocating and freeing a temporary and parsing the truncated string, seems very slow and probably adds a lot of overhead to the cast operation for the fixed-width strings.

Copy link
Member Author

@SwayamInSync SwayamInSync Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a nuance here, let me explain so 36 for exact decimal digits and 42 for scientific notation
They are not exposed by SLEEF, we keep the max string len to 50 (42 + some buffer)

Now when casting a QuadPrecision value to a fixed-width string (like U10) then the dragon4 will generate full buffer representation but it will get truncated to only 10 chars. Then here we in same_value check we need to re-create the null-terminated buffer.

Oh good point, that buffer can be in local stack, we don't be needing the heap allocation here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 1 character for sign
  • 1 character for leading digit
  • 1 character for decimal point
  • 36 significant digits
  • 1 character for e
  • 1 character for exponent sign
  • 4 characters for exponent (max is 4932)
  • 1 null terminator

So it should be total of 46 characters (still 50 is safe)

val_str);
}
else {
PyErr_SetString(PyExc_ValueError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong: you shouldn't set a new error here if quad_to_string_adaptive_cstr already set an error. I'd just bubble up the error instead of trying to give more context here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

val_str);
}
else {
PyErr_SetString(PyExc_ValueError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue here - an error is already set, setting a new one without resetting the error indicator is wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

val_str);
}
else {
PyErr_SetString(PyExc_ValueError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending the already-mentioned corrections. Tests should be catching edge cases.

@SwayamInSync
Copy link
Member Author

SwayamInSync commented Jan 14, 2026

@ngoldbaum these new changes should address all the reviews, I found 2 more cases of error-overriding and patched them here (inside scalar.c and umath.cpp)

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience here!

@SwayamInSync
Copy link
Member Author

Great! Merging this in.
Thanks everyone 🙌

@SwayamInSync SwayamInSync merged commit c182ddb into numpy:main Jan 14, 2026
11 checks passed
@juntyr
Copy link
Contributor

juntyr commented Jan 15, 2026

Thanks! How long do you think will it now take to release v1.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use same_value casting in astype

4 participants