Skip to content

Updated STL casters and py::buffer to use collections.abc #5566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 14, 2025

Conversation

timohl
Copy link
Contributor

@timohl timohl commented Mar 16, 2025

Description

This updates type hints for the set, map, list and array STL casters, as well as for py::buffer, to use the more generic collections.abc types in convertible arguments.
The casters have been changed to allow types derived from those abstract base classes.

caster convert-arg return/noconvert-arg
set_caster collections.abc.Set set
map_caster collections.abc.Mapping dict
list_caster collections.abc.Sequence list
array_caster collections.abc.Sequence list

Old description:

For map_caster this is exactly how the caster works.
Unfortunately, list_caster, set_caster and array_caster work a bit different regarding noconvert:
For args list_caster and array_caster always allow sequence and set_caster always allows anyset, but all three allow iterable in convert mode.

The current system only differs between input and output type (io_name) and falls back to the output type in noconvert args.
These casters' behavior would require three different type hints: arg-convert, arg-noconvert, return.
Therefore, I currently see no way to improve these type hints further without deeper changes.
So for now, I think this should be a good compromise for most use cases.

During testing I found that map_caster and set_caster did not always allow for types derived from collections.abc.Mapping and collections.abc.Set since the checks contained additional restictions (like having an items() method for mappings), while the list caster already allowed types derived from collections.abc.Sequence.
This PR changes the behavior of those casters to allow types derived from those base classes.

Additionally, the array_caster was updated to match the typing.Annotated style of numpy/eigen type hints.

As suggested by @InvincibleRMC, the type hint Buffer was changed to collections.abc.Buffer since Buffer does not exist in the typing module.

Resolves #5498

Suggested changelog entry:

Added support for collections.abc in type hints and convertible checks of STL casters and py::buffer

@timohl
Copy link
Contributor Author

timohl commented Mar 17, 2025

The failing check is unrelated, I think (maybe rerun is enough).

@timohl
Copy link
Contributor Author

timohl commented Mar 17, 2025

After seeing #5498 and digging deeper into the caster code, I noticed that I have to think more about this.

These three functions restrict the casters further than I thought:

inline bool PyObjectTypeIsConvertibleToStdVector(PyObject *obj) {
if (PySequence_Check(obj) != 0) {
return !PyUnicode_Check(obj) && !PyBytes_Check(obj);
}
return (PyGen_Check(obj) != 0) || (PyAnySet_Check(obj) != 0)
|| PyObjectIsInstanceWithOneOfTpNames(
obj, {"dict_keys", "dict_values", "dict_items", "map", "zip"});
}
inline bool PyObjectTypeIsConvertibleToStdSet(PyObject *obj) {
return (PyAnySet_Check(obj) != 0) || PyObjectIsInstanceWithOneOfTpNames(obj, {"dict_keys"});
}
inline bool PyObjectTypeIsConvertibleToStdMap(PyObject *obj) {
if (PyDict_Check(obj)) {
return true;
}
// Implicit requirement in the conditions below:
// A type with `.__getitem__()` & `.items()` methods must implement these
// to be compatible with https://docs.python.org/3/c-api/mapping.html
if (PyMapping_Check(obj) == 0) {
return false;
}
PyObject *items = PyObject_GetAttrString(obj, "items");
if (items == nullptr) {
PyErr_Clear();
return false;
}
bool is_convertible = (PyCallable_Check(items) != 0);
Py_DECREF(items);
return is_convertible;
}

For example, it requires the mapping to be of type or subtype of set or frozenset or have dict_keys if I understand correctly.
The caster itself uses the Mapping protocol though and could easily be changed to fully allow it.
I will add some tests to better map out what is allowed and what not and how this relates to the type hints.

Git blame directed me to #4686, which seems to have more insight.
@rwgk if you remember this PR, I would love to hear your view.
If not, I will dig into this PR on the weekend and summarize my findings here.

@rwgk
Copy link
Collaborator

rwgk commented Mar 18, 2025

@timohl Did you see these already?

These three functions restrict the casters further than I thought:

I'm fine if you want to work on those functions. I'm thinking it's best to keep the current logic, which is super fast, but where we're currently returning false, add additional sophisticated conditions as needed.

@timohl timohl marked this pull request as draft March 18, 2025 15:25
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only glanced through very quickly. Is this still in draft mode intentionally?

@timohl
Copy link
Contributor Author

timohl commented Mar 25, 2025

I only glanced through very quickly. Is this still in draft mode intentionally?

I would like to improve the comments before finalizing and being ready to merge.
Unfortunately, I was pretty busy the last couple of days and could not find enough time.
I can probably get back to it tomorrow.

Also, your comment about the function names sounds good. I will change that.

@rwgk
Copy link
Collaborator

rwgk commented Mar 25, 2025

No rush, at all, from my end. I just wanted to be sure you're not waiting for my feedback.

@InvincibleRMC
Copy link
Contributor

Would it also be possible to update Buffer type to collections.abc.Buffer? More info here.

@timohl
Copy link
Contributor Author

timohl commented Mar 26, 2025

Would it also be possible to update Buffer type to collections.abc.Buffer? More info here.

Going through the code here:

template <>
struct handle_type_name<object> {
static constexpr auto name = const_name("object");
};
template <>
struct handle_type_name<list> {
static constexpr auto name = const_name("list");
};
template <>
struct handle_type_name<dict> {
static constexpr auto name = const_name("dict");
};
template <>
struct handle_type_name<anyset> {
static constexpr auto name = const_name("Union[set, frozenset]");
};
template <>
struct handle_type_name<set> {
static constexpr auto name = const_name("set");
};
template <>
struct handle_type_name<frozenset> {
static constexpr auto name = const_name("frozenset");
};
template <>
struct handle_type_name<str> {
static constexpr auto name = const_name("str");
};
template <>
struct handle_type_name<tuple> {
static constexpr auto name = const_name("tuple");
};
template <>
struct handle_type_name<bool_> {
static constexpr auto name = const_name("bool");
};
template <>
struct handle_type_name<bytes> {
static constexpr auto name = const_name(PYBIND11_BYTES_NAME);
};
template <>
struct handle_type_name<buffer> {
static constexpr auto name = const_name("Buffer");
};
template <>
struct handle_type_name<int_> {
static constexpr auto name = io_name("typing.SupportsInt", "int");
};
template <>
struct handle_type_name<iterable> {
static constexpr auto name = const_name("Iterable");
};
template <>
struct handle_type_name<iterator> {
static constexpr auto name = const_name("Iterator");
};
template <>
struct handle_type_name<float_> {
static constexpr auto name = io_name("typing.SupportsFloat", "float");
};
template <>
struct handle_type_name<function> {
static constexpr auto name = const_name("Callable");
};
template <>
struct handle_type_name<handle> {
static constexpr auto name = handle_type_name<object>::name;
};
template <>
struct handle_type_name<none> {
static constexpr auto name = const_name("None");
};
template <>
struct handle_type_name<sequence> {
static constexpr auto name = const_name("Sequence");
};
template <>
struct handle_type_name<bytearray> {
static constexpr auto name = const_name("bytearray");
};
template <>
struct handle_type_name<memoryview> {
static constexpr auto name = const_name("memoryview");
};
template <>
struct handle_type_name<slice> {
static constexpr auto name = const_name("slice");
};
template <>
struct handle_type_name<type> {
static constexpr auto name = const_name("type");
};
template <>
struct handle_type_name<capsule> {
static constexpr auto name = const_name("types.CapsuleType");
};
template <>
struct handle_type_name<ellipsis> {
static constexpr auto name = const_name("ellipsis");
};
template <>
struct handle_type_name<weakref> {
static constexpr auto name = const_name("weakref");
};

There are a bunch of other types that could be changed:

  • Union[set, frozenset] -> typing.Union[set, frozenset] or maybe better set | frozenset
  • Buffer -> collections.abc.Buffer
  • Iterable -> collections.abc.Iterable
  • Iterator -> collections.abc.Iterator
  • Callable -> collections.abc.Callable
  • Sequence -> collections.abc.Sequence
  • ellipsis -> types.EllipsisType
    @InvincibleRMC Would you agree with those? Am I missing some?

@InvincibleRMC
Copy link
Contributor

InvincibleRMC commented Mar 26, 2025

Currently stub generators typically know that Iterable and the other types are available in the typing module. However, this doesn't apply to Buffer since it does not exist in the typing module. If we determine it is better to make all the types explicit (in the form of foo.bar.Baz) we should also update all the types found in typing.h.

@timohl
Copy link
Contributor Author

timohl commented Apr 4, 2025

I have changed Buffer to collections.abc.Buffer.
If all deprecated types from typing (or without any module reference) should be changed to collections.abc.*, this should probably go into a separate PR.

The convertible check functions now contain some comments to make it more obvious what is allowed and what not.
Those functions have also been renamed to object_is_*.

I removed explicit checks for methods required by the collections.abc base classes (e.g., __getitem__, since they already check for those methods on instantiation (see quick test in interactive Python):

class FakeSeq(collections.abc.Sequence): ...

a = FakeSeq()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class FakeSeq with abstract methods __getitem__, __len__

@timohl timohl marked this pull request as ready for review April 4, 2025 22:26
@timohl timohl requested a review from rwgk April 4, 2025 22:53
@timohl
Copy link
Contributor Author

timohl commented Apr 4, 2025

I have just updated the PR description as well.

@timohl timohl changed the title Updated STL type hints to use collections.abc Updated STL casters and py::buffer to use collections.abc Apr 4, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The production code is great as-is! I only have suggestions for simple changes to the test code.

Here is another small suggestion generated by ChatGPT:

The PR includes comments explaining the rationale behind the changes, particularly the shift to using collections.abc. These comments are clear and provide valuable context. However, ensuring consistency in terminology (e.g., consistently referring to collections.abc.Set rather than alternating with collections.Set) would enhance clarity.

"a": 1,
"b": 2,
"c": 3,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to keep this more compact via

    a1b2c3 = {"a": 1, "b": 2, "c": 3}

and then reuse three times.

Copy link
Contributor Author

@timohl timohl Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 6fc2ab3

I just made this change for the mapping test.
In the sequence test, there is now:

pybind11/tests/test_stl.py

Lines 619 to 623 in 383dcb5

assert m.roundtrip_std_vector_int_noconvert(FormalSequenceLike(1, 2, 3)) == [
1,
2,
3,
]

Should I change this to use a variable as well (like list123)?

@timohl
Copy link
Contributor Author

timohl commented Apr 13, 2025

The production code is great as-is! I only have suggestions for simple changes to the test code.

Thanks for the review. I added some commits to address your comments.

Here is another small suggestion generated by ChatGPT:

* https://chatgpt.com/share/67fb2d28-21d4-8008-bfea-597507977bfb

The PR includes comments explaining the rationale behind the changes, particularly the shift to using collections.abc. These comments are clear and provide valuable context. However, ensuring consistency in terminology (e.g., consistently referring to collections.abc.Set rather than alternating with collections.Set) would enhance clarity.

I am not sure though what chatgpt means here.
Searching for the term "collections" in the entire code base, I could not find any inconsistency in terminology.
Either collections.abc is used correctly or the term is used in another context as far as I can see.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure though what chatgpt means here.
Searching for the term "collections" in the entire code base, I could not find any inconsistency in terminology.
Either collections.abc is used correctly or the term is used in another context as far as I can see.

Sorry, I didn't double-check the ChatGPT finding; must be a hallucination then. Thanks for checking!

@rwgk rwgk merged commit ee04df0 into pybind:master Apr 14, 2025
65 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Apr 14, 2025
@gentlegiantJGC
Copy link
Contributor

@timohl @InvincibleRMC @rwgk
collections.abc.Buffer was only added in Python 3.12 according to the documentation.
https://docs.python.org/3/library/collections.abc.html#collections.abc.Buffer
From my understanding pybind11 supports versions older than that.
Is it possible to use collections.abc.Buffer for 3.12+ and typing_extensions.Buffer for versions older than that?
I don't know how this information could be accessed at compile time.

@timohl
Copy link
Contributor Author

timohl commented May 12, 2025

collections.abc.Buffer is only used in type annotations.
With PEP 563 Postponed evaluation of annotations (from __future__ import annotations) this should not raise any error at runtime.

Tested in Python 3.8:

>>> import collections.abc
>>> def b(x: collections.abc.Buffer) -> float:
...     return 4
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'collections.abc' has no attribute 'Buffer'
>>> from __future__ import annotations
>>> def b(x: collections.abc.Buffer) -> float:
...     return 4
... 
>>> 

Stubs generated by pybind11-stubgen contain from __future__ import annotations, so those should work with older Python versions as well.

Do you have a use case or tool where this creates a problem?
Otherwise, I would prefer not adding any additional version checks (especially since typing.Annotated is used a lot and was introduced in Python 3.9 while pybind11 still supports 3.8)

@gentlegiantJGC
Copy link
Contributor

You are correct that there are no runtime issues but a static type checker in 3.11 would error because collections.abc.Buffer does not exist in that version.
Is there a way to check which python version is being compiled against and switch the behaviour based on that?

@timohl
Copy link
Contributor Author

timohl commented May 12, 2025

Ok, I see. Pylance and mypy are complaining, that is true.

nanobind uses a central place to define typing types depending on the Python version:
https://github.com/wjakob/nanobind/blob/62fc996018d9ea4d51af9c86cf008c2562b4eeab/include/nanobind/nb_defs.h#L95-L127

Maybe something similar would be good instead of having version checks all over the place.

@gentlegiantJGC
Copy link
Contributor

My personal preference would be to keep it where it gets used unless it is used in multiple places but I am not a developer here.
I think this issue applies to a number of other type hints as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs changelog Possibly needs a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: STL casters should probably type cast from collections.abc
4 participants