bug: `dt.highest_precedence([dt.uint64, dt.int32])` gives `uint64`, not `int64` #7331

NickCrews · 2023-10-11T18:32:29Z

What happened?

Came from exploring around exposing highest_precedence() publicly per #5707

from ibis.expr import datatypes as dt

dt.highest_precedence([dt.int32, dt.uint64])

EDIT: this is a bit wrong, see cpclouds comment below.
gives uint64. I would expect it to be int64, because if you had a negative number in the int32 column, I would want that to still be negative in the unioned dtype. Maybe I'm not understanding this correctly though.

What version of ibis are you using?

main

What backend(s) are you using, if any?

NA

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

cpcloud · 2023-10-11T19:56:24Z

The correct answer is probably int128 (but we don't have that type and if we did we have the same problem again).

I think I would expect the current system to error if there's no type that can represent the set of values common to both types.

int64 wouldn't be correct here because it cannot represent many uint64 values.

gforsyth · 2023-10-11T20:19:37Z

I think this means we would remove the ability to cast from signed to unsigned integers in all cases, which is more restrictive than the backends actually are -- although I recognize this is a bit intractable without knowing the bounds of the values in a given column.

NickCrews · 2023-10-11T21:59:06Z

int64 wouldn't be correct

oh I see, my bad.

I think this means we would remove the ability to cast from signed to unsigned integers in all cases, which is more restrictive than the backends actually are -- although I recognize this is a bit intractable without knowing the bounds of the values in a given column.

All uint32s are representable by int64s, so casting in that case should always be allowed. Are you talking about the uintX to intX case? I think casting there should be allowed because backends support it.

I think there are two different functionalities here:

"is x castable to y?"

What needs to happen during an explicit cast() call. Is much more lenient, because the user is doing an explicit cast.

castable(uint64, int64) == true
castable(list, float) == false

"find a dtype that can represent both dtypes safely"

What happens during an implicit cast. Currently we are relying on castable() to do this, but this is wrong. For instance, with #7332 applied we get the wrong behavior (grep through the codebase for usage of highest_precedence to look for vulnerabilities):

uint8s = ibis.array([255]).cast("array<uint8>")
int8s = ibis.array([127, -128]).cast("array<int8>")
x = uint8s.concat(int8s)
print(x.type())
print(x)
# array<uint8>
# [255, 127, -128]

Really this should return array<int16>, or it should error, not sure which. In the case of array<uint64> and array<int64>, this should definitely error because there is no safe implicit casting option.

I would expect these properties:

If z = find_common(x, y), then

z might be x or y, but not necessarily.
castable(x, z) to be true
castable(y, z) to be true

If castable(a, b) is true, then find_common(a, b) might error or might find something: (uint64, int64) is castable but has no common.

If castable(c, d) is false, then find_common(c, d) must error: (float, list) aren't safely nor unsafely castable.

gforsyth · 2023-10-12T14:46:32Z

Are you talking about the uintX to intX case?

No, those are tractable by going up by a power of 2 in bit-width, but int to uint requires knowledge of the underlying value to be able to say whether the cast is valid or not.

I think the split you've identified is good! So if a user says "just cast it" we can allow it (within reason) and then let the backend fail or succeed.

Implicit casting should find a dtype that can encompass both dtypes (or the implicit cast should fail, if it cannot).

NickCrews · 2023-10-12T16:41:42Z

Are you talking about the uintX to intX case?

Ah sorry I wasn't clear, I think we're on the same page I meant X was the same width for both eg uint32 to int32

gforsyth · 2024-07-31T17:47:18Z

There are valid issues here, but it's not currently blocking any of our work and fixing it is liable to be an insane 🐰 🕳️ -- if someone is interested in tackling this work without breaking the entire project, we're game to talk it through, but this isn't going to get picked up any time soon.

NickCrews added the bug Incorrect behavior inside of ibis label Oct 11, 2023

NickCrews mentioned this issue Oct 11, 2023

bug: ibis.literal(129, type="uint8") gives TypeError #7334

Closed

1 task

NickCrews mentioned this issue Oct 23, 2023

feat: upcast dtype on union of two schemas #5707

Closed

1 task

lostmygithubaccount added this to Ibis planning and roadmap Nov 30, 2023

github-project-automation bot moved this to backlog in Ibis planning and roadmap Nov 30, 2023

ncclementi mentioned this issue Apr 26, 2024

[meta] Ibis expression API stability #8996

Closed

10 tasks

ncclementi mentioned this issue Jul 19, 2024

[meta]: Ibis expression and backend stability #9638

Open

13 tasks

gforsyth closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024

github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: `dt.highest_precedence([dt.uint64, dt.int32])` gives `uint64`, not `int64` #7331

bug: `dt.highest_precedence([dt.uint64, dt.int32])` gives `uint64`, not `int64` #7331

NickCrews commented Oct 11, 2023 •

edited

Loading

cpcloud commented Oct 11, 2023 •

edited

Loading

gforsyth commented Oct 11, 2023

NickCrews commented Oct 11, 2023 •

edited

Loading

gforsyth commented Oct 12, 2023

NickCrews commented Oct 12, 2023

gforsyth commented Jul 31, 2024

bug: dt.highest_precedence([dt.uint64, dt.int32]) gives uint64, not int64 #7331

bug: dt.highest_precedence([dt.uint64, dt.int32]) gives uint64, not int64 #7331

Comments

NickCrews commented Oct 11, 2023 • edited Loading

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

cpcloud commented Oct 11, 2023 • edited Loading

gforsyth commented Oct 11, 2023

NickCrews commented Oct 11, 2023 • edited Loading

"is x castable to y?"

"find a dtype that can represent both dtypes safely"

gforsyth commented Oct 12, 2023

NickCrews commented Oct 12, 2023

gforsyth commented Jul 31, 2024

bug: `dt.highest_precedence([dt.uint64, dt.int32])` gives `uint64`, not `int64` #7331

bug: `dt.highest_precedence([dt.uint64, dt.int32])` gives `uint64`, not `int64` #7331

NickCrews commented Oct 11, 2023 •

edited

Loading

cpcloud commented Oct 11, 2023 •

edited

Loading

NickCrews commented Oct 11, 2023 •

edited

Loading