Skip to content

Record Types #685

Open
Open
@saulshanabrook

Description

@saulshanabrook

I would like to be able to type a Dataframe like object with MyPy, where different columns have different types and you can get each as column as an attribute on the dataframe. This is how libraries like Pandas and Ibis work.

Generally, this requires a function to return different types by mapping string literals to different types (record kinds).

Here is a mock example implemented in Typescript, which checks properly:

class Column {
    mean(): number {
        return 0;
    }
}

class GeoColumn extends Column {
    length(): number {
        return 0;
    }
}

class Dataframe<T extends { [key: string]: Column} > {

    constructor(private cols: T) {

    }

    getColumn<K extends keyof T>(name: K): T[K] {
        return this.cols[name]
    }
}

const d = new Dataframe({ name: new Column(), location: new GeoColumn() });

d.getColumn("name").mean();
// We can call `length` because this is a GeoColumn
d.getColumn("location").length();

Possible Syntaxes

Here are a few possible ways this could be spelled in Python:

self as TypedDict

Since we already have a TypedDict construct one of the least invasive approaches is to type self as a TypeDict.

This would probably require anonymous TypeDicts, which was proposed previously (python/mypy#985 (comment)).

It would also required TypedDicts to be able to take generic parameters.

class Column:
    def mean(self) -> int:
        return 0


class GeoColumn(Column):
    def length(self) -> int:
        return 0


T = TypeVar("T", bound=Dict[str, Column])

K = TypeVar("K", bound=str)
V = TypeVar("V", bound=Column)


class Dataframe(Generic[T]):
    def __init__(self, cols: T):
        self.cols = cols

    def __getattr__(self: Dataframe[TypedDict({K: V})], name: K) -> V:
        return self.cols[name]


d = Dataframe({"name": Column(), "location": GeoColumn()})

d.name.mean()
d.location.length()

Type Level .keys and __getitem__

Another option would be to mirror how Typescript does this, by introducing type level keys and __gettitem__ functions. This would also require generic to depend on other generics (python/mypy#2756).

T = TypeVar("T", bound=Dict[str, Column])

K = TypeVar("K", bound=KeyOf[T])


class Dataframe(Generic[T]):
    def __init__(self, cols: T):
        self.cols = cols

    def __getattr__(self, name: K) -> GetItem[T, K]:
        return self.cols[name]

Conclusion

I would like to have a way to type Dataframes that have different column types in a generic way. This is useful for typing frameworks like Ibis or Pandas.

This is somewhat related to variadic generics I believe (#193). Also related: dropbox/sqlalchemy-stubs#69

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: featureDiscussions about new features for Python's type annotations

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions