Replies: 4 comments 5 replies
-
|
This is really interesting! I was glossing over until I got to the Python <-> native type interop.Will there be some way for a DataFrame library to specify a mapping/conversion of Python to native types? In particular, it would be nice if something like this would type check: def int_to_str(x: int) -> str:
return str(x)
def str_to_int(x: str) -> int:
return int(x)
Series([1, 2, 3, 4], dtype=int64).apply(int_to_str).dtype # utf8
Series([1, 2, 3, 4], dtype=int64).apply(str_to_int) # errorThis also has interplay with Dynamic assignment of series/creation of a derived TDAn operation that is very common with data frames but uncommon with TypedDict is re-assigning a field to a different type: df = DataFrame(dtypes={"a": int64, "b": utf8})
df["b"] = Series(dtype=float) |
Beta Was this translation helpful? Give feedback.
-
|
I like this idea, one small point, how does this interact with PEP 695? |
Beta Was this translation helpful? Give feedback.
-
|
Here's an alternative implementation strategy that would work well with PEP 695 (without a need for further syntax changes) and perhaps make the behavior more obvious (cross-posted from https://discuss.python.org/t/pep-696-type-defaults-for-typevarlikes/22569/15?u=jelle):
from typing import TypedDict
class MyGeneric[TD: TypedDict]: ...
d: MyGeneric[TypedDict[{"foo": int, "bar": str, "baz": bool}]] = ...This would work mostly like an existing TypeVar, except that type checkers would allow TypedDict-specific operations on values of type
from typing import Literal, TypedDict, KeyType
def want_literals(arg: Literal["a", "b"]): ...
arg1: KeyType[TypedDict[{"a": int, "b": str}]]
want_literals(arg1) # ok
arg2: KeyType[TypedDict[{"a": int, "c": str}]]
want_literals(arg2) # rejected, Literal["a", "c"] is incompatible with Literal["a", "b"]This operator would work on both concrete TypedDicts and TypeVars bound to TypedDict or a subtype. We would also add Edit: Eric actually suggested something very similar above (#1387 (reply in thread)). |
Beta Was this translation helpful? Give feedback.
-
|
There may be value in extending this mechanism to from __future__ import annotations
from typing import *
from enum import auto, Enum
class SomeEnum(Enum):
ALFA = auto()
BRAVO = auto()
SomeEnumName = Literal['ALFA', 'BRAVO'] # not DRY
def some_func(some_enum: Union[SomeEnum, SomeEnumName]) -> None:
# preprocess 'some_enum' arg:
if isinstance(some_enum, str):
some_enum = SomeEnum[some_enum]
assert isinstance(some_enum, SomeEnum)
# do useful things
... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
prior discussion:
Table of contents
.keyand.valuework?Mapwork?**unpacking not needed?TD.key_unionandTD.value_uniondataclass_transformBasic idea
Nikita Sobolev proposed a nice inline syntax for TypedDict which will probably end up looking like this:
The nice thing is that it doesn’t require any grammar change in Python.
I think the problem of “key types” for Pandas
DataFramesand other TypedDict-like containers can be solved the same way! Without any grammar changes!We just need a new TypeVar-like:
TypeVarDict, which is a generalization ofTypeVarTuplebut also shares a lot of traits withParamSpec. Basic usage:but to make it really useful, we also need
TD.key,TD.valueand the new special formMap.As a motivating example, here is how you would type-annotate
pandas.DataFramewith this (the most important part is the definition of__getitem__):Now I’ll explain everything in more detail.
How do
.keyand.valuework?If
TDis aTypeVarDict, then whenever you useTD.keyin a function signature, you also have to useTD.valueand vice versa (just like withParamSpec’s.argsand.kwargs).TD.keyandTD.valueare essentially expanded as overloads. So, for example, say we have the following class, which usesTD.keyandTD.valuein its__setitem__method:Then this is equivalent to:
TD.keyandTD.valuecan also appear in the return type, as in:How does
Mapwork?To really make
TypeVarDictuseful, the special formMaphas to be introduced as well.Mapwas introduced in this proto-PEP.It works like this:
This is needed for example in the definition of
read_csv:The
dtypeobject that you pass in will look something like{"col1": np.int64}but that has typedict[{"col1": type[np.int64]}], and not typedict[{"col1": np.int64}]which is what we need in order to infer the correct type for theDataFrame.So, the
type[]needs to be stripped away somehow. That is whatMapdoes: thedtypewe pass in has typedict[{"col1": type[np.int64]}]which gets matched todict[Map[type, TD]]which means thatTDis inferred as{"col1": np.int64}, just as we wanted.Aside: the proto-PEP linked above defines
Mapto be used onTypeVarTupleslike this:Why is
**unpacking not needed?In PEP 646 where
TypeVarTupleswere introduced, it is specified thatTypeVarTuplesmust always be unpacked with*, as inTuple[*Ts]. Why is this not needed here?That's because there's not actually anything to spread. In this way,
TypeVarDictis more akin toParamSpecthanTypeVarTuple.Consider:
can be
A[int, str]orA[bool, bool, bool]. That is, the*Tstakes up an arbitrary number of “top-level slots” in theA[...]expression.But ParamSpec with
only takes up one “top-level slot”:
B[[str, str]], meaning that anotherTypeVarcould come after:with, for example,
C[[int, str], bool]. And indeed, it is very common to have aTypeVarafter aParamSpec(which isn’t possible withTypeVarTuple).Like
ParamSpec,TypeVarDictalso only takes up one “top-level slot”:can be
D[{"foo": str}, bool]orD[{"bar": bool, "baz": bool}, str].So, as with
ParamSpec, there should be no unpacking withTypeVarDict.The unpacking is only needed for annotating
**kwargsas specified in PEP 692, whereTDacts like an arbitraryTypedDict:Though, the grammar change in PEP 692 was rejected, so in practice it’s:
Bonus feature:
TD.key_unionandTD.value_unionIn addition to
TD.keyandTD.value, there could also beTD.key_unionandTD.value_union.TD.key_unionwould be the union of allkeyliterals andTD.value_unionwould be the union of all value types.This would, for example, be useful for typing
.keys()and.values()inTypedDicts:Who would use this?
Any library that has DataFrame-like objects:
Dict-wrappers like ModuleDict in PyTorch.
Potentially, ORMs like SQLAlchemy?
Aside on the rejected PEP 637
PEP 637 proposed to add this syntax:
matrix[row=20, col=40], which would have been a perfect fit forTypeVarDict, but I think the syntax with the curly braces is also fine.Comparison to
dataclass_transformPEP 681’s
dataclass_transformallows us to create a base class such that all subclasses act likedataclasses:This allows you to get somewhat similar behavior to the proposed
TypeVarDict, but I see several shortcomings:Mapfunctionality which allows us to return aSeriesobject fordf["col1"]instead of just thedtypeas in the abovedataclass_transformexampleBeta Was this translation helpful? Give feedback.
All reactions