Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overloads and intersections; research and analysis #860

Open
KotlinIsland opened this issue Jan 11, 2025 · 4 comments
Open

overloads and intersections; research and analysis #860

KotlinIsland opened this issue Jan 11, 2025 · 4 comments

Comments

@KotlinIsland
Copy link
Owner

KotlinIsland commented Jan 11, 2025

the point

  • overloads are not intersections
  • hypothetical unordered overload resolution that provides intersection distribution

preconditions

all code samples can be assumed to use these definitions:

class A: ...
class B: ...
class C(A, B): ...
type FA = (A) -> A
type FB = (B) -> B
type FInt = (int) -> int
type FStr = (str) -> str
type FObject = (object) -> object

primer

overloads match by signature order:

@overload
def f(a: object) -> object: ...
@overload
def f(a: str) -> str: ...
    
f("")  # object

overload invocation should distribute unions

when a call can be split into multiple parts via distribution of union,
a valid/more specific return type can be derived:

@overload
def f(a: int) -> int: ...
@overload
def f(a: str) -> str: ...

x: int | str
f(x)  # int | str; 
      # we distribute the union into two separate calls:
      #   f(int) -> int
      #   f(str) -> str
      # combining the result of both of these calls reveals `int | str`

this is safe and correct because each signature of the overload is valid with all portions of the union
and the implementation should be guaranteed to return each correctly

this isn't unique to overloads though, and also applies to generic functions:

def f[T](d: list[T]) -> set[T]:
    ...
data: list[int] | list[str]
f(data)  # split the union into separate calls
         #   f(list[int]) -> set[int]
         #   f(list[str]) -> set[str]
         # combined return types: set[int] | set[str]

i suppose this is because one way of thinking of generic functions is that they (kind of) represent an overload of every possible type

overload implementations need to follow the semantics of the signature matching

@overload
def f(a: object) -> int: ...
@overload
def f(a: str) -> str: ...
def f(a):
    if isinstance(a, str):
        return "hi"  # should receive an error here, as the `isinstance` has selected from both overloads
                     #  but `str` doesn't match the first overload
    return 1
f("") # matches object overload so should be None, but in reality it will be "hi"

therefore we can deduce that any overload signature matching semantics cannot be invalidated by hypothetical implementations, because they must follow the same rules

shorthand proposal

i propose an Overload type that would be equivalent to a Protocol with the equivalent overloads:

class IntThenObject(Protocol):
    @overload
    def __call__(self, x: int, /) -> int: ...
    @overload
    def __call__(self, x: object, /) -> object: ...

int_then_object: IntThenObject
assert_type(int_then_object, Overload[FInt, FObject])  # these two types are identical

# but not to differently ordered `Overload`s, this is how it currently works
_: Overload[FObject, FInt] = int_then_object  # error

the problem

overloaded functions are often referred to as intersections, but this is not true.
overloads are neither supertypes nor subtypes of a corresponding intersection

this is true for many reasons, primarily:

here we demonstrate that Overload[FInt, FStr] is not a subtype of FInt & FStr

class Demo[In, Out]:
    def __init__(self, f: (In) -> Out):
        self._f = f

    def put(self, value: In):
        self._value = value
    def get(self) -> Out:
        return self._f(self._value)

@overload
def f(x: int) -> int: ...
@overload
def f(x: str) -> str: ...
def f(x):
    return x

d: Demo[int, int] & Demo[str, str] = Demo(f)
d_int: Demo[int, int] = d
d_str: Demo[str, str] = d

d_int.put(1)
d_str.get()  # expected: str, runtime: 1

additionally, if intersections of callables were considered as overloads:

a: FObject & FStr
a("")  # object
b: FStr & FObject
b("")  # str

if this were the case, then intersections could not be commutative. this is so absurd that it won't be considered

within this document i will examine these two ideas:

  • overloads aren't the same as intersections
  • change overload semantics to not depend on order

option 1: overloads aren't the same as intersections

callable intersection return type

intersections of callables should follow normal intersection rules and return intersections:

f: FA & FB
f(A())  # `A & B`

regarding synthesis of overloads from an intersection:

class FnObj:
    def f(self, o: object) -> object: ...
class FnStr:
    def f(self, o: str) -> str: ...
x: FnObj & FnStr
x.f  # `FObject & FStr`, this doesn't form an overload, instead just an intersection, because there is no way to 
     #  specify an overload order

Option 2: overload semantics that don't depend on order

given this scenario:

@overload
def f(a: A) -> A: ...
@overload
def f(a: B) -> B: ...

if we were to apply some hypothetical non-ordered matching semantics,
then regarding implementations of f, we could say:

def f(a):
    if isinstance(a, A):
        return A()  # error, this branch could match both overloads
    if isinstance(a, B):
        return B()  # no error, the first overload has been ruled out
def f(a):
    if isinstance(a, A):
        return C()  # no error, this value matches both overloads
    if isinstance(a, B):
        return B()  # no error, the first overload has been ruled out
def f(a):
    if isinstance(a, A) and not isinstance(a, B):
        return A()  # no error, this branch only matches the first overload
    if isinstance(a, B):
        return B()  # error, this could match the first overload as it hasn't been exhausted
    # error: the first overload hasn't been exhausted
def f(a):
    if isinstance(a, A) and not isinstance(a, B):
        return A()  # no error, this branch only matches the first overload
    if isinstance(a, B) and not isinstance(a, A):
        return B()  # no error, this branch only matches the second overload
    # error, neither overload has been exhausted, must return an A & B here

def f(a):
    return a  # no error, matches both overloads correctly

considering this invocation:

f(C())

traditional overload semantics would suggest that the return type would be A,
but if we apply these non-ordered semantics, then the call with match both overloads and return A & B

regarding synthesis of overloads from an intersection:

class FnObj:
    def f(self, o: object) -> object: ...
class FnStr:
    def f(self, o: str) -> str: ...
x: FnObj & FnStr
x.f  # `Overload[FInt, FObject]`, this can cleanly form an overload as there is no need for order,
     #  the class that implements will be required to implement it as an overload
@jorenham
Copy link
Collaborator

overload invocation should distribute unions

when a call can be split into multiple parts via distribution of union, a valid/more specific return type can be derived:

@overload
def f(a: int) -> int: ...
@overload
def f(a: str) -> str: ...

x: int | str
f(x)  # int | str; 
      # we distribute the union into two separate calls:
      #   f(int) -> int
      #   f(str) -> str
      # combining the result of both of these calls reveals `int | str`

this is safe and correct because each signature of the overload is valid with all portions of the union and the implementation should be guaranteed to return each correctly

Technically speaking, this is only valid if you define | to be the disjoint union. Because if you don't (as is currently the case, according to the typing spec, even though overloads in mypy and pyright both seem to ignore that in case of overloads), then an additional 3rd overload f(a: int | str) -> complex would cause f(x) to be inferred as complex.

I think that's unintuitive, and that it makes a lot more sense that int | str actually means "either int or str". That way yoiu know for certain that if y: int | str isn't an int, it's a str, and vice-versa. Without disjoint unions, that isn't the case, as the 3rd f(x: int | str) -> complex shows. Because in case of disjoint unions, that 3rd overload wouldn't be matched, because the first two overloads already covered all of the possible cases.

@KotlinIsland
Copy link
Owner Author

KotlinIsland commented Jan 11, 2025

then an additional 3rd overload f(a: int | str) -> complex would cause f(x) to be inferred as complex.

i disagree with this, if this was the semantics, how could an implementation possibly conform to that overload?

@overload
def f(a: int) -> int: ...
@overload
def f(a: str) -> str: ...
@overload
def f(a: str | int) -> complex: ...
def f(a):
    if isinstance(a, int):
        return 1
    if isinstance(a, str):
        return ""

def fn(x: int | str):
    f(x)  # expected: complex, actual: 1
fn(1)

additionally, you should receive an error:

def f(a: str | int) -> complex: ...  # error: overload overlaps

playground

@jorenham
Copy link
Collaborator

jorenham commented Jan 11, 2025

type= object
type= Never

type Sets[InT] = Callable[[InT], None]
type Gets[OutT] = Callable[[], OutT]
type Maps[InT, OutT] = Callable[[InT], OutT]

class A: ...
class B: ...
%%{
    init: {
        "themeVariables": {"wrap": "false"},
        "flowchart": {
            "curve": "linear",
            "markdownAutoWrap":"false",
            "wrappingWidth": "600"
        }
    }
}%%
graph BT
    N["⊥"]
    
    N --> SO["Sets[⊤]"]
    SO --> SAuB["Sets[A | B]"]
    SAuB --> SA["Sets[A]"]
    SA --> SAnB["Sets[A & B]"]
    SAuB --> SB["Sets[B]"]
    SB --> SAnB
    SAnB --> SN["Sets[⊥]"]
    SN --> O["⊤"]
    
    N --> MON["Maps[⊤, ⊥]"]
    MON --> MAuBAnB["Maps[A | B, A & B]"]
    MAuBAnB --> MAA["Maps[A, A]"]
    MAA --> MAnBAuB["Maps[A & B, A | B]"]
    MAuBAnB --> MAB["Maps[A, B]"]
    MAB --> MAnBAuB
    MAuBAnB --> MBA["Maps[B, A]"]
    MBA --> MAnBAuB
    MAuBAnB --> MBB["Maps[B, B]"]
    MBB --> MAnBAuB
    MAnBAuB --> MNO["Maps[⊥, ⊤]"]
    MNO --> O
    
    N --> GN["Gets[⊥]"]
    GN --> GAnB["Gets[A & B]"]
    GAnB --> GA["Gets[A]"]
    GA --> GAuB["Gets[A | B]"]
    GAnB --> GB["Gets[B]"]
    GB --> GAuB
    GAuB --> GO["Gets[⊤]"]
    GO --> O
Loading

@jorenham
Copy link
Collaborator

i disagree with this, if this was the semantics, how could an implementation possibly conform to that overload?

It can't! That's why non-disjoint unions are stupid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants