Description
I would like to be able to type a Dataframe like object with MyPy, where different columns have different types and you can get each as column as an attribute on the dataframe. This is how libraries like Pandas and Ibis work.
Generally, this requires a function to return different types by mapping string literals to different types (record kinds).
Here is a mock example implemented in Typescript, which checks properly:
class Column {
mean(): number {
return 0;
}
}
class GeoColumn extends Column {
length(): number {
return 0;
}
}
class Dataframe<T extends { [key: string]: Column} > {
constructor(private cols: T) {
}
getColumn<K extends keyof T>(name: K): T[K] {
return this.cols[name]
}
}
const d = new Dataframe({ name: new Column(), location: new GeoColumn() });
d.getColumn("name").mean();
// We can call `length` because this is a GeoColumn
d.getColumn("location").length();
Possible Syntaxes
Here are a few possible ways this could be spelled in Python:
self
as TypedDict
Since we already have a TypedDict
construct one of the least invasive approaches is to type self
as a TypeDict
.
This would probably require anonymous TypeDict
s, which was proposed previously (python/mypy#985 (comment)).
It would also required TypedDict
s to be able to take generic parameters.
class Column:
def mean(self) -> int:
return 0
class GeoColumn(Column):
def length(self) -> int:
return 0
T = TypeVar("T", bound=Dict[str, Column])
K = TypeVar("K", bound=str)
V = TypeVar("V", bound=Column)
class Dataframe(Generic[T]):
def __init__(self, cols: T):
self.cols = cols
def __getattr__(self: Dataframe[TypedDict({K: V})], name: K) -> V:
return self.cols[name]
d = Dataframe({"name": Column(), "location": GeoColumn()})
d.name.mean()
d.location.length()
Type Level .keys
and __getitem__
Another option would be to mirror how Typescript does this, by introducing type level keys
and __gettitem__
functions. This would also require generic to depend on other generics (python/mypy#2756).
T = TypeVar("T", bound=Dict[str, Column])
K = TypeVar("K", bound=KeyOf[T])
class Dataframe(Generic[T]):
def __init__(self, cols: T):
self.cols = cols
def __getattr__(self, name: K) -> GetItem[T, K]:
return self.cols[name]
Conclusion
I would like to have a way to type Dataframes that have different column types in a generic way. This is useful for typing frameworks like Ibis or Pandas.
This is somewhat related to variadic generics I believe (#193). Also related: dropbox/sqlalchemy-stubs#69