Skip to content

Explore building third party stubs as packages #2491

@srittau

Description

@srittau
Collaborator

This is an alternative to #2440 (disallowing third-party stubs). The idea is that typeshed remains/becomes a central repository for third-party stubs that are not bundled with the parent package, similar to DefinitelyTyped. In the future I expect type checkers will not want to bundle all third-party stubs for a variety of reasons, so third-party stubs would be distributed as separate PEP 561 stub-only packages, one per upstream package.

(I tried to integrate points raised there into this issue, especially those by @JukkaL in this comment.)

Advantages

  • Due to typeshed's tests, packages in typeshed will continue to work with the latests versions of mypy and pytype.
  • Basic level of consistency and standards due to review by typeshed maintainers.
  • Consistent naming scheme for third-party stubs packages, allowing users to just "pip install <guessed name>" and it will work when there are stubs.
  • Tooling (like tests) is easier to manage as it can remain part of the typeshed package.
  • Easier to contribute to stubs, since contributors don't need to learn the intricacies of multiple stubs projects.
  • No need to start a separate project just to distribute stubs for a new package.

Issues

  • Workload issues for typeshed maintainers.
  • Typeshed maintainers will often not be familiar with the package for which pull requests are opened.
  • Publishing stubs takes longer due to the necessary reviews.

Further Considerations

What should the generated packages be called? @ethanhs's PEP 561 actually requires stubs-only package to be named <package>-stubs. typeshed could squat these names and release them (and remove the stubs) on the request of upstream maintainers. Alternatively, typeshed could add a common prefix or suffix (ts, typeshed) or in addition to or instead of the -stubs suffix. This would be in violation of PEP 561, so we'd need to get broader consensus to amend the PEP. My personal favorite would be <package>-ts.

To guarantee a fairly quick turnaround on stubs, to minimize work for publishing stubs, and to prevent all third-party stub packages to be updated whenever a new typeshed version is released, stubs for a specific third-party package should be published automatically when it changes.

Possible Implementation

  • Add a generic setup-tp.py to typeshed that takes its package name from the directory it's in and uses the current date and time as version number.
  • Amend the CI process for master only so that after a successful test run, for every third-party package that was changed since the last successful run, the following is done automatically:
    1. Copy setup-tp.py into the third-party module directory as setup.py.
    2. Build the package in that directory.
    3. Upload to pypi.

Activity

emmatyping

emmatyping commented on Sep 27, 2018

@emmatyping
Member

I haven't had time to think over the full proposal yet but I did want to correct a misunderstanding and add some relevant information.

First a minor (but important!) nitpick: PEP 561 is about package names. The things you install off of PyPi are distributions, which have zero or more packages. The example I always give to explain this is that you pip install Django but import django. django is the package, Django is the distribution.

Therefore, it would be possible to install, say setuptools-stubs, along with any number of other stub packages from the typeshed distribution. So we don't really need to squat names.

Speaking of squatting names, there has already been some discussion of this topic in pypi/warehouse#4164. Donald Stufft is planning on namespacing packages in some form, so it is likely flask will end up with flask-stubs as a reserved distribution name anyway.

JukkaL

JukkaL commented on Sep 27, 2018

@JukkaL
Contributor

I like the proposal! Here are a few additional thoughts.

If we split (most) third party stubs to separate stub packages, type checkers could propose a relevant stub package to install if they can't find stubs for some package. This would be easy to do if third party stubs are managed centrally, but hard if they are spread out across numerous repositories.

I think that it would make sense to request permission proactively for the most popular packages on PyPI to be included on typeshed (I mentioned this in #2440). This would make it easier to contribute new stubs, as asking for permission is something many people are uncomfortable doing. I'm happy to try doing this if others think that this is a good idea.

I think that review workload is the biggest potential issue. I'm ready to volunteer to do more code reviews if we can pull some of the proposed changes off, in part because I expect they could well make code reviews easier. I have some ideas about making core review workload manageable below.

First, I think that it would be a big help if we had tests for third party stubs. At the very least, we should have a .py file that exercises the most important functionality in the stub. A test would pass if mypy (or some other tool) could type check the test file without errors. This can be refined later, but I think that this would be sufficient in the beginning.

Second, if we have support for tests, we can require that any new third party stubs have decent test coverage. Also, most fixes to a stub should include a test. Code reviews will be simpler if we can trust that tests catch any egregious errors at least. Also, the presence of tests makes it more likely that the PR author has done a reasonable job, thus hopefully requiring fewer review iterations. We can ask the creator of a new stub to actually verify that the test file can be run against the library (not just checked). It's probably impractical to execute tests in typeshed CI, though.

Third, since third party stubs wouldn't be installed by default, errors in those stubs would be less serious than now, as it would be easy to uninstall the stubs if there are problems, and it would be easy to revert back to an earlier stub version. Reporting bugs in stubs back to typeshed would perhaps also be simpler, as it would be easy to attribute blame to a certain typeshed stub package. Due to these reasons, code reviews could be less strict for third party packages (i.e. no need to double check against the implementation if something is unclear), excepting maybe the most popular packages. Here my rationale is that it will be much easier to encourage users to contribute small fixes to not-very-polished stubs than to create a new set of stubs from scratch. Automatically having the fixes available on PyPI immediately after the PR has been merged could be another big motivating factor.

Finally, maybe we should consider requiring the use of a code auto-formatter for stubs (at least new ones), if there is one that works for stubs. Or we could maybe introduce more aggressive linting for new stubs. These would improve the readability of contributed stubs.

ilevkivskyi

ilevkivskyi commented on Sep 27, 2018

@ilevkivskyi
Member

There are few questions I wanted to clarify about the proposal:

  • What to do with stubs for large frameworks that require plugins to work correctly? (SQLAlchemy, Django, NumPy, etc.)
  • How are we going to manage the versioning of stub packages?
  • Will incomplete stubs be allowed? Will we require __getattr__ for such to avoid false positives, or just not allow them at all?
JukkaL

JukkaL commented on Sep 27, 2018

@JukkaL
Contributor

What to do with stubs for large frameworks that require plugins to work correctly? (SQLAlchemy, Django, NumPy, etc.)

I think that these could be decided on a case-by-case basis. If the stubs don't generate false positives without a plugin, maybe the stubs can be included in typeshed. We don't need to include all stubs in typeshed, but I think that it's a good place for the "long tail" of stubs where nobody may be motivated enough to set up a separate repository just for stubs.

How are we going to manage the versioning of stub packages?

I proposed that we could automatically generate a version number from the last modified date.

Will incomplete stubs be allowed? Will we require __getattr__ for such to avoid false positives, or just not allow them at all?

In my opinion __getattr__ should be acceptable for large or complex packages, but we should try to avoid it for small and medium sized stubs.

JelleZijlstra

JelleZijlstra commented on Sep 27, 2018

@JelleZijlstra
Member

Finally, maybe we should consider requiring the use of a code auto-formatter for stubs (at least new ones), if there is one that works for stubs. Or we could maybe introduce more aggressive linting for new stubs. These would improve the readability of contributed stubs.

https://github.com/ambv/black supports auto-formatting stubs. (I added the support.)

sproshev

sproshev commented on Sep 27, 2018

@sproshev
Contributor

I proposed that we could automatically generate a version number from the last modified date.

It means that version number would have no connection with runtime package version and non-descriptive as a result.
Let's imagine some environment requires numpy=1.14.1 to be installed, how a developer will figure out what stub package version to use?

sproshev

sproshev commented on Sep 27, 2018

@sproshev
Contributor

We at PyCharm met this problem while developing stub packages advertiser. There is no way to determine what stub package version fits installed runtime package without installing them from the newest one to the eldest.

JukkaL

JukkaL commented on Sep 27, 2018

@JukkaL
Contributor

@sproshev Fair point. I'm not sure if there's anything we can do that would always work reliably.

We could perhaps include the package version in typeshed as metadata. The stub package version could be derived from that (and incremented by 1 for each update). For example, the stub package versions for numpy 1.14.1 could be something like 1.14.1, 1.14.1+1, 1.14.1+2, etc. However, in practice it would be quite likely that stubs wouldn't fully conform to any package version. Maybe the version would only be a best-effort hint, and would roughly correspond the latest package version that is known to have some support by the stub (i.e. works in a reasonable fashion, but is not necessarily complete).

I don't think that we need anything perfect here. I'd be happy with an approach that works well for simple cases (most modules probably have pretty simple interfaces) and supports gradual refinement for more complex cases. The key would be making it easy for users to contribute. Crowdsourcing seems the only feasible approach for stubs.

Other ideas for guidelines:

  1. We'd normally only actively maintain one set of stubs per package (the latest supported). Stubs for older versions would only be available on PyPI (and git history). Maybe we can make exceptions for certain very popular libraries if they break backward compatibility without changing the package name, but hopefully this is pretty rare.
  2. For existing stubs we should perhaps initially pick some arbitrary version (such as 0.0.1) until somebody validates which version the stubs best conform to.
emmatyping

emmatyping commented on Oct 4, 2018

@emmatyping
Member

Okay, so after thinking this over, I decided I really like this idea. I definitely think it would be a good idea to reach out to the typescript folks and hear about their experiences, since they likely have already hit problems we will in executing this plan.

As for versioning, this is actually covered in PEP 561. Essentially, the stub package declares which version(s) of the runtime package it supports in the install requires. Therefore we should be able to tell what package fulfill requirements based on the requires_dist metadata in a package.

At worst, you have to download (but not install!) several wheels. I think this gets a lot easier if we work with the warehouse folks to make sure the metadata we need is available through their JSON API (some if not all of it already is).

With regards to partial packages, I would say people can start with their own incomplete packages, and once they pass a minimum bar, they can be brought into typeshed.

As for plugins, I don't really see an issue with including them, because they are not likely to cause issues with other type checkers, and they don't take that much disk space. Alternatively, we could do some packaging magic and make a separate plugin package that gets installed if you ask for that extra (e.g. pip install pkg-stubs[plugin]). That wouldn't be too painful as we can automate a lot of it.

Lastly, and probably most importantly, reviewing the stubs. I am also happy to step up and do more review as well. I also think pulling something like the mypy test suite out of mypy and making it more generic is a good idea, as is requiring new packages/improvements to be tested.

Related to reviewing, I have been inspired by Marietta to hold "office hours". The idea is to help people with static typing and PEP 561 packaging for probably about an hour every week. I am hoping to start holding hours sometime this month. If there are others who want to do this too perhaps we could coordinate.

130 remaining items

ilevkivskyi

ilevkivskyi commented on Jan 29, 2021

@ilevkivskyi
Member

I think we are almost ready, sorry for a little delay, we were not able to enable the auto-upload today, we will enable it Monday morning. In the meantime, I think it is already OK to start merging PRs again.

ymotongpoo

ymotongpoo commented on Feb 1, 2021

@ymotongpoo

First of all, thanks for working for the better type hinting. I look forward to all type information packages coming out to PyPI. Now I have a question on the upcoming updates on the type information metadata in all libraries in this repository.

How does the metadata file (METADATA.toml) will include the supporting runtime package versions? I saw a discussion around the versioning of stub packages but wasn't confirmed the concrete mention on the description on the metadata file.

If I missed the exact part that tells the answer for them, apologies in advance.

ilevkivskyi

ilevkivskyi commented on Feb 2, 2021

@ilevkivskyi
Member

I finally switched the auto-upload task from --dry-run to normal typeshed-internal/stub_uploader@96cb137, so I think this issue may now be closed as implemented.

ilevkivskyi

ilevkivskyi commented on Feb 2, 2021

@ilevkivskyi
Member

How does the metadata file (METADATA.toml) will include the supporting runtime package versions?

Currently you can include version = "x.y" in metadata which means that this stub package only valid for runtime package version x.y or newer.

ymotongpoo

ymotongpoo commented on Feb 3, 2021

@ymotongpoo

@ilevkivskyi Thank you for the reply. Then now the stub packages will be automatically uploaded to PyPI based on that version information?

For example, the stub for paramiko is version "0.1", whose runtime package's current latest version is "2.7.2". In this example, given paramiko is following semver, 2 major version diff means huge interface breaking changes happened during the update, which means at least the stub version should be 2.0, preferably 2.7. And I see other similar cases.

Can we assume that now all stub packages are ready to accept the pull requests for such version number corrections?

hauntsaninja

hauntsaninja commented on Feb 3, 2021

@hauntsaninja
Collaborator

Yes, we'd accept PRs that update METADATA.toml to the oldest version of the package that the stubs reflect support for (#4981)

JukkaL

JukkaL commented on Feb 3, 2021

@JukkaL
Contributor

When adding new stubs, the version should reflect the actual version of the package that stubs were created for. Since this will be hard to determine reliably for existing stubs, it's okay for the version to refer to some older version with which they are still compatible.

We probably also don't want the version to refer to a more recent version version of the package, or a version that is poorly supported. If a stub is created for version 2.3 of a library but also contains a few 2.4 features (but we know/suspect that some important 2.4 features are missing), I think it's fine to continue to use 2.3 as the version of the stubs.

ymotongpoo

ymotongpoo commented on Feb 5, 2021

@ymotongpoo

@hauntsaninja @JukkaL Thank you for the reply.

I understand that we need to avoid the confusion that the version changes will bring and we can keep using versions in the existing METADATA.toml files. However, that applies only to the case the stub version is in the same major version as the package current, and not to the case I referred to. (i.e. 2 major version diffs)

Anyway, it's great that the discussion is already happening in #4981 and I'm happy to contribute to this part.

ofek

ofek commented on Aug 18, 2024

@ofek
Sponsor

FYI I just opened a PEP about package repository namespaces and I used typeshed as the first motivating example 🙂 https://discuss.python.org/t/pep-752-package-repository-namespaces/61227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    project: policyOrganization of the typeshed project

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jstasiak@srittau@LouisStAmour@ymotongpoo@asvetlov

        Issue actions

          Explore building third party stubs as packages · Issue #2491 · python/typeshed