Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to accept dicts with extra keys? #4617

Closed
john375639 opened this issue Feb 21, 2018 · 32 comments · Fixed by #14225
Closed

How to accept dicts with extra keys? #4617

john375639 opened this issue Feb 21, 2018 · 32 comments · Fixed by #14225

Comments

@john375639
Copy link

A lot of times I pass data around using dictionaries. When I receive the data, for example as a function argument, what I want is a dict with that contains some specifics keys. I don't mind if the dict has more data.

So, I tried typed dicts:

Movie = TypedDict('Movie', {'name': str, 'year': int})

This works:

movie = {'name': 'Blade Runner', 'year': 1982}  # type: Movie

But this throws the error Extra key 'has_sequel' for TypedDict "Movie":

movie = {'name': 'Blade Runner', 'year': 1982, 'has_sequel': True}  # type: Movie

I can understand that you can't replace for the first value, because the result of keys or items is different.

But if I am only interested in having those keys, not iterating or other stuff, what are my options (if any)?

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Feb 21, 2018

There is a cool new feature called total=False, does it work for you?
Here are some docs http://mypy.readthedocs.io/en/latest/kinds_of_types.html#totality

@gvanrossum
Copy link
Member

This looks like a feature request for TypedDict.

@ilevkivskyi
Copy link
Member

@gvanrossum Actually I am quite sure total=False covers this, or am I missing something?

@ilevkivskyi
Copy link
Member

Hm, I see one problem: potentially, the list of optional keys can be very long (since one still needs to list them).

@gvanrossum
Copy link
Member

Sadly no -- total=False allows omitting keys, it doesn't allow extra keys.

# ...imports...
A = TypedDict('A', {'x': int}, total=False)
def f(a: A) -> None:
  print(a['x'])
b: A = {'x': 0, 'y': 0}  # E: Extra key 'y' for TypedDict "A"

@emmatyping
Copy link
Collaborator

Yes, I think this is a valid feature request, as enumerating keys in the TypedDict could be painful.

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 22, 2018

This is already supported for function arguments:

from mypy_extensions import TypedDict

A = TypedDict('A', {'x': int})
B = TypedDict('B', {'x': int, 'y': str})

def f(x: A) -> None: ...

b: B = {'x': 1, 'y': 'foo'}
f(b)  # Ok

The issues seems specific to creating a typed dict -- mypy makes sure that no extra keys are provided (as these could be accidental typos or obsolete key names, for example).

I don't think that changing the semantics of total=False is a good idea, since we'd lose a lot of type safety. One option would be to introduce a new flag. Let's call it allow_extra=True for now. Here is an attempt to define it:

  • Don't complain about extra keys when creating a typed dict object if allow_extra is true for the type. Allow arbitrary extra keys and values.
  • Allow accessing arbitrary keys, such as x['whatever']. The value type would have to be Any unless the key is explicitly defined for the typed dict.
  • Allow setting arbitrary keys with arbitrary values, such as x['whatever'] = <anything>.

These semantics seem kind of arbitrary to me. To seriously consider such as feature I'd like to see more concrete examples where the current behavior is not sufficient. If the structure of a typed dict object is not well specified, the current recommendation is to use Dict[str, Any] instead. TypedDict is pretty restricted by design.

If we had intersection types, a similar effect to my proposed allow_extra flag could perhaps be achieved through Intersection[MyTypedDict, Dict[str, Any]]. We don't have any concrete plans to introduce intersection types, however.

@john375639
Copy link
Author

Thanks.

I think that I can work with the current behavior.

JukkaL's example is what I am trying to do. Something like:

from typing import Iterable
from mypy_extensions import TypedDict

NamedThing = TypedDict('NamedThing', {'name': str})
Movie = TypedDict('Movie', {'name': str, 'year': int})
Replicant = TypedDict('Replicant', {'name': str, 'model': str})

def slug(x: NamedThing) -> str:
    return x['name'].lower().replace(' ', '-')

blade_runner: Movie = {'name': 'Blade Runner', 'year': 1982}
roy: Replicant = {'name': 'Roy', 'model': 'Nexus-6'}
things: Iterable[NamedThing] = [blade_runner, roy]

for thing in things:
    print(slug(thing))

When trying Mypy I was directly assigning the values or using variables without annotations, like:

blade_runner: NamedThing = {'name': 'Blade Runner', 'year': 1982}

slug({'name': 'Blade Runner', 'year': 1982})

blade_runner = {'name': 'Blade Runner', 'year': 1982}
slug(blade_runner)

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 26, 2018

I'm closing this since it seems that the current behavior is mostly good enough, and the solution to the larger issue would be messy.

@JukkaL JukkaL closed this as completed Feb 26, 2018
achimnol added a commit to lablup/backend.ai-common that referenced this issue Sep 8, 2019
* Now we use TypedDict to validate JSON-based dictionaries at the time
  of code writing.

* Currently TypedDict has some limitations, but they are the issues only
  during "static" type checks, not in the runtime.

  - It does not allow extra keys (in contrast to trafaret.Dict)
    ref: python/mypy#4617

  - dict is not a subtype of TypedDict; instead we should use Mapping
    and MutableMapping.
@alexjurkiewicz
Copy link
Contributor

I would like this issue to be re-opened. I think extra_keys is important to have.

The current solution of Dict[str, Union[str, int, float, bool]] is significantly less expressive than what other languages offer.

Typescript example:

// Simple example
interface ElasticsearchDocument {
    _id: string
    _type: string
    _index: string
    @timestamp: number
    [key: string]: string|number|boolean;
}

// Complex example: Nested dict
interface SubObject {
  [key: string]: string|number|boolean|SubObject
}
interface DynamoDBDocument {
  index_id: number
  secondary_index_id: number
  [key: string]: string|number|boolean|SubObject
}

The above suggestion of allow_extra still has a capability gap where I cannot define the type of extra items (either the key or value).

I don't have a good sense of the restrictions on syntax that MyPy has to deal with. So this proposal might be totally unreasonable. But the syntax could be:

TypedDict("ElasticsearchDocument",
          {"_id": str},
          extra_keys={str: Union[str,int,float,bool]})

This says: "this is a dictionary with _id as a guaranteed key. There might be other keys, they have string type and value matches Union[str,int,float,bool].

Theoretically you could spec something like this:

TypedDict("ComplexDict",
          {1: str, "a": int},
          extra_keys={int: bool, str: str})

@tuukkamustonen
Copy link

I would also like to see extra keys allowed. My use case is in a HTTP/REST client:

class SomeApiClient:

    class RegisterRequest(TypedDict):
        ...

    class RegisterResponse(TypedDict):
        ...

    def register(self, payload: RegisterRequest) -> RegisterResponse
        ...
        return resp_json
  1. As payload - this is API that already exists, and accepts both fixed/defined and arbitrary key-value pairs on main level. This cannot be mapped currently. Nesting dynamic data under extra key or similar:

    class RegisterRequest(TypedDict):
        static_field: str
        extra: dict

    ...cannot be done, because the API already exists and cannot be changed just like that.

  2. As response - the server returns some irrelevant fields, which are really not worth mapping (but calling code might be interested about them). Or maybe the server adds a new field that the caller is interested about, before RegisterResponse is updated in the next version.

I would happily risk typoing (as explained in #6223 (comment)) just to have extra keys (it would be optional feature, anyway).

@septatrix
Copy link
Contributor

This might be solveable using Protocols, Literals and __getitem__. One could have a Protocol class with a generic __getitem__ and many @overloads of said method using a Literal key and the corresponding return type. Whilst this would require many overloads it could work.

@ML-Chen
Copy link

ML-Chen commented Sep 3, 2020

Can this issue be reopened?

@jlumbroso
Copy link

@JukkaL As @alexjurkiewicz mentions, the advantage of allowing extra keys is that then TypedDict function as a sort of poor person's interface: "You need to provide at least these fields" with implied that extra keys would be ignored.

From a practical perspective: Python is a language that integrates many different systems, protocols, etc., it's not infrequent for various protocols to have slightly different extensions—the point of standardizations is to try to avoid that, but it's seemed inevitable. Since a lot of protocols communicate in JSON, having TypedDict's that expect several fields for sure but allow any other fields is useful. This is also the case when you have several versions of a protocol—you make the never one a superset of the older one to be backwards compatible.

This is not a duplication of the total flag: The total flag is a global way to address the possibility of optional fields.

So you could have total=False and extra=True for when you allow for certain fields to be optional—and are providing the type just as guide of which fields can be affected/tuned—but also to ignore any superset.

From a programming language theoretical perspective: The topic of typed extensible records is not new, but dates to Mitchell & Wand in the 70s. A recent paper about this can be found here:

You may be amused to know the paper begins this way 😊🤗:

Records and variants provide flexible ways to construct datatypes, but the restrictions imposed by practical type systems can prevent them from being used in flexible ways. These limitations are often the result of concerns about efficiency or type inference, or of the difficulty in providing accurate types for key operations.

It then goes on:

Unfortunately, practical languages are often less flexible in the operations that they provide to manipulate
records and variants. For example, many languages—from C to Standard ML (SML)—will only allow the programmer to select the l component, r.l, from a record r if the type of r is uniquely determined at compile-time.

@gvanrossum I suspect having the field extra would be enable the kind of polymorphism that this paper encourages—and which is actually inherent in Python's original "duck" typing, and precisely what made Python such an attractive, and sane language to use for so many people.

@MicahLyle
Copy link

A use case that I I think I have with this is integrating with a legacy MongoDB API where a given record could have a bunch of key/value pairs, but I'm really only concerned with say 3-4 of them in the codebase I'm working with. I want to strongly type the 3-4 that I'm concerned with, and not worry about the other ones but rather just pass them around.

@hauntsaninja
Copy link
Collaborator

Linking this discussion on typing-sig https://mail.python.org/archives/list/[email protected]/thread/66RITIHDQHVTUMJHH2ORSNWZ6DOPM367/#QYOBBLTWVSEWMFRRHBA2OPR5QQ4IMWOL

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 6, 2021

As mentioned in the typing-sig thread, this probably requires a PEP. The author of the PEP should figure out how the subtyping rules would need to be extended and some other technical details. I'm not against making TypedDicts more flexible with a flag that allows arbitrary keys with Any values, and there seems to quite a lot of interest in this. However, I don't care about this quite enough to write a PEP. If anybody wants to volunteer, I can probably find time to at least give feedback on the PEP draft.

dcbaker added a commit to dcbaker/meson that referenced this issue Aug 10, 2021
This is really, really, annoying. What we really want is (psuedocode):
```python
class SubValues(TypedDict[str, str], total=False):

    @input@: T.List[str]
    @output@: T.List[str]
```

Which would specifiy that `@INPUT@` and `@OUTPUT@` *may* be present and
if they are, then they are lists. There may be additional keys, which
have a single string as a value. Of course there is currently no way to
do that with typing, so we have to instead take a union type and then
use asserts to help the type checker unerstand this.

More info: python/mypy#4617
dcbaker added a commit to dcbaker/meson that referenced this issue Aug 12, 2021
This is really, really, annoying. What we really want is (psuedocode):
```python
class SubValues(TypedDict[str, str], total=False):

    @input@: T.List[str]
    @output@: T.List[str]
```

Which would specifiy that `@INPUT@` and `@OUTPUT@` *may* be present and
if they are, then they are lists. There may be additional keys, which
have a single string as a value. Of course there is currently no way to
do that with typing, so we have to instead take a union type and then
use asserts to help the type checker unerstand this.

More info: python/mypy#4617
@ciaransweet
Copy link

@JukkaL I am unfamiliar with all the places requests etc. go - Are you aware of this being discussed/worked on more recently?

I'm looking for the same functionality, TypedDict where it allows arbitrary other keys.

Thanks!

@peterdeme
Copy link

Anyone has a good solution for that?

@amin-nejad
Copy link

For the time being, if you know what these extra keys are that may or may not be in the dictionary ahead of time, you can include them in the definition but mark them as NotRequired (see PEP 655)

@Rocamonde
Copy link

I would also find this very useful. TypedDicts are very limiting without this. I would have to go back to using a generic Dict type instead of giving some information on keys that will certainly be available.

@JoaquimEsteves
Copy link
Contributor

I second the notion that this issue be re-opened.

Coming over from Typescript this has been my biggest pain point with mypy and Python's typing in general.

Discovering this was a helpful surprise, but it's still not good enough - I'm forced to cast dictionaries I'm quite sure fullfill the TypedDict's conditions, which feels like an admission of defeat.

To bring it back into the scope of mypy (instead of delegating it to PEP) I'd suggest adding a new configuration flag.

--allow-extra-keys

@ilevkivskyi
Copy link
Member

I don't think we need a new flag for this, I think we can simply:

  • Use a dedicated error code (e.g. typeddict-unknown-key) instead of generic typeddict-item for the cases @JukkaL mentioned above
  • Still type check other (known) keys, if a known one was found

Then people will be able to simply say --disable-error-code=typeddict-unknown-key. If someone submits a PR doing this, I will approve it (or maybe I can do it myself this or next weekend, but can't say for sure).

@ilevkivskyi ilevkivskyi reopened this Nov 30, 2022
JoaquimEsteves added a commit to JoaquimEsteves/mypy that referenced this issue Nov 30, 2022
See: [python#4617](python#4617)

This allows the following code to trigger the error
`typeddict-unknown-key`

```python
A = T.TypedDict("A", {"x": int})

def f(x: A) -> None:
    ...

f({"x": 1, "y": "foo"})
```

The user can then safely ignore this specific error at their
disgression.
@JoaquimEsteves
Copy link
Contributor

@ilevkivskyi I submitted a small PR for this.

Feel free to yell at me when you find mistakes perouse it at your own convenience.

(Apologies if pinging is considered rude)

ilevkivskyi added a commit that referenced this issue Jan 27, 2023
Fixes #4617

This allows the following code to trigger the error
`typeddict-unknown-key`

```python
A = T.TypedDict("A", {"x": int})

def f(x: A) -> None:
    ...

f({"x": 1, "y": "foo"})  # err: typeddict-unknown-key
f({"y": "foo"})  # err: typeddict-unknown-key & typeddict-item
f({"x": 'err', "y": "foo"})  # err: typeddict-unknown-key & typeddict-item

a: A = { 'x': 1 }

# You can set extra attributes
a['extra'] = 'extra' # err: typeddict-unknown-key
# Reading them produces the normal item error
err = a['does not exist'] # err: typeddict-item
```

The user can then safely ignore this specific error at their
disgression.

Co-authored-by: Ivan Levkivskyi <[email protected]>
@ilevkivskyi
Copy link
Member

For posterity, if you want mypy to not complain about unknown/extra keys in a TypedDict you can use --disable-error-code=typeddict-unknown-key (starting mypy 1.1.0, didn't make into 1.0.0, sorry).

@StefanBrand
Copy link

For Toblerity/Fiona#1125 we are trying to mimic the GeoJSON specification. While there are some reserved keys like type and coordinates, any other keys are allowed. It is currently not possible to type this with TypedDict because it does not allow extra keys.


This might be solveable using Protocols, Literals and __getitem__. One could have a Protocol class with a generic __getitem__ and many @overloads of said method using a Literal key and the corresponding return type. Whilst this would require many overloads it could work.

I think this actually does not work. Are there working approaches different from the two below?

Two examples:

1. should pass, fails, playground

from typing import Any, Protocol, Literal, overload

class SomeSpecificKeysDict(Protocol):
    @overload
    def __getitem__(self, key: Literal["foo"]) -> str:
        ...
        
    @overload
    def __getitem__(self, key: Literal["bar"]) -> int:
        ...

my_dict: SomeSpecificKeysDict = { "foo": "a string", "bar": 2 }

This should pass, but it errors with

main.py:14: error: Incompatible types in assignment (expression has type "dict[Literal['foo'], str]", variable has type "SomeSpecificKeysDict")  [assignment]
main.py:14: note: Following member(s) of "dict[Literal['foo'], str]" have conflicts:
main.py:14: note:     Expected:
main.py:14: note:         @overload
main.py:14: note:         def __getitem__(self, Literal['foo'], /) -> str
main.py:14: note:         @overload
main.py:14: note:         def __getitem__(self, Literal['bar'], /) -> int
main.py:14: note:     Got:
main.py:14: note:         def __getitem__(self, Literal['foo'], /) -> str
main.py:14: error: Dict entry 1 has incompatible type "Literal['bar']": "int"; expected "Literal['foo']": "str"  [dict-item]
Found 2 errors in 1 file (checked 1 source file)

2. Should fail, passes, playground

from typing import Any, Protocol, Literal, overload

class SomeSpecificKeysDict(Protocol):
    @overload
    def __getitem__(self, key: Literal["foo"]) -> str:
        ...
        
    @overload
    def __getitem__(self, key: Literal["bar"]) -> int:
        ...

    @overload
    def __getitem__(self, key) -> Any:
        ...
        

my_dict: SomeSpecificKeysDict = { "foo": "a string", "bar": "another_string" }

Success: no issues found in 1 source file

@alecov
Copy link

alecov commented Jul 21, 2023

@JukkaL As @alexjurkiewicz mentions, the advantage of allowing extra keys is that then TypedDict function as a sort of poor person's interface: "You need to provide at least these fields" with implied that extra keys would be ignored.

From a practical perspective: Python is a language that integrates many different systems, protocols, etc., it's not infrequent for various protocols to have slightly different extensions—the point of standardizations is to try to avoid that, but it's seemed inevitable. Since a lot of protocols communicate in JSON, having TypedDict's that expect several fields for sure but allow any other fields is useful. This is also the case when you have several versions of a protocol—you make the never one a superset of the older one to be backwards compatible.

This is not a duplication of the total flag: The total flag is a global way to address the possibility of optional fields.

So you could have total=False and extra=True for when you allow for certain fields to be optional—and are providing the type just as guide of which fields can be affected/tuned—but also to ignore any superset.

From a programming language theoretical perspective: The topic of typed extensible records is not new, but dates to Mitchell & Wand in the 70s. A recent paper about this can be found here:

You may be amused to know the paper begins this way blushhugs:

Records and variants provide flexible ways to construct datatypes, but the restrictions imposed by practical type systems can prevent them from being used in flexible ways. These limitations are often the result of concerns about efficiency or type inference, or of the difficulty in providing accurate types for key operations.

It then goes on:

Unfortunately, practical languages are often less flexible in the operations that they provide to manipulate
records and variants. For example, many languages—from C to Standard ML (SML)—will only allow the programmer to select the l component, r.l, from a record r if the type of r is uniquely determined at compile-time.

@gvanrossum I suspect having the field extra would be enable the kind of polymorphism that this paper encourages—and which is actually inherent in Python's original "duck" typing, and precisely what made Python such an attractive, and sane language to use for so many people.

The extra argument discussed here and all the aforementioned insights are, in my opinion, the absolutely correct solution for this issue. The proposed marked as solution "throw an error and let users ignore it with a pragma" is a kludge.

The extra argument as proposed correctly encodes the intention that a certain interface allows extensions such as what is done in TypeScript:

{ [string]: any; }

The problems with the "ignore errors" approach are obvious:

  1. The type itself does not encode the information that the interface is extensible, which might further cause problems down the road if this information would be useful for a certain typing construct later on.
  2. It's essentially a pragma-based "ignore this warning". Linter warnings exist to tag problematic constructs, but this is not a problematic construct — it is an ordinary and well-written piece of code which does not warrant a warning. Warning ignoring is to be used sporadically in well-justified situations, otherwise it just looks and feels wrong to use it as an ordinary code feature.

Thus I believe this issue should be reopened; the proposed solution is just an unacceptable hack.

@erictraut
Copy link

@alecov, if you'd like to propose or endorse changes to the Python static type system, the python/typing discussion forum would be a better place than the mypy issue tracker. Mypy is correctly implementing support for TypedDict as it is currently spec'ed. (BTW, I agree with many of your points above.)

@bast1aan
Copy link

bast1aan commented Jul 31, 2023

I also miss this a lot in the current TypedDict functionality, which makes it unsuitable to use in old code bases where I have to deal with large dicts with only a few keys that are of interest for the code that I am refactoring at that moment. Covering all keys that are not of direct importance would add way too much boilerplate to the code base. The same for using @overloaded __getitem__()s with Literals.

For new code we avoid dicts between (public) functions anyway in favor of dataclasses.

Allowing unknown keys is largely only a problem because of mutability (invariance). When I have that problem with ordinary dicts, I solve it with the Mapping[str, object] type, which disallows the mutable behaviour.
Thus, could it be that some kind of immutable TypedDict used for return types, argument types, etc, solve this?
For now, it seems that inheriting (combining) a type from TypedDict and Mapping to indicated both are supported, is not allowed by mypy.

@PIG208
Copy link

PIG208 commented Sep 17, 2023

I would like this issue to be re-opened. I think extra_keys is important to have.

The current solution of Dict[str, Union[str, int, float, bool]] is significantly less expressive than what other languages offer.

Typescript example:

// Simple example
interface ElasticsearchDocument {
    _id: string
    _type: string
    _index: string
    @timestamp: number
    [key: string]: string|number|boolean;
}

// Complex example: Nested dict
interface SubObject {
  [key: string]: string|number|boolean|SubObject
}
interface DynamoDBDocument {
  index_id: number
  secondary_index_id: number
  [key: string]: string|number|boolean|SubObject
}

This opens up a hole like this:

interface A {
    value: string;
    [key: string]: string | number;
}

interface B extends A {
    foo: number;
}

const x: B = {value: "asd", foo: 12}
function mut(v: A) {
    v.foo = "asd"
}
mut(x)
const y: B = x
console.log(y)  // {"value": "asd", "foo": "asd"}

If we are bringing this in Python, do we actually want it to be a sound system, if that's actually possible?

@ikonst
Copy link
Contributor

ikonst commented Sep 17, 2023

@PIG208 TypedDicts are, arguably, structural types and allow behaviors that, if you view TypedDicts as nominal types, would appear unsound (e.g. if you think of B as a "base class" of `A in the example above). TypeScript docs discuss this.

@mharding-hpe
Copy link

For those wanting this ticket reopened, I just opened a feature request which may be of interest:
#18176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.