Separate private from public API: Underscore convention or `all` #290

vpratz · 2024-12-28T17:19:38Z

This is related to the documentation issue #280, where we only want to render the public API documentation, but goes a bit further.

In many cases, our current setup indicates a desired import structure (e.g. importing bayesflow.transforms.Transform instead of bayesflow.transforms.transform.Transform. This is intuitive, but it is implicit and we cannot automatically (for generating the API docs) tell that the latter is not intended usage and should be private.

As far as I can tell, there are two usual approaches:
a) prepending private modules and files with an underscore (e.g., bayesflow.transforms._transform.Transform)
b) adding public functions, classes and modules to __all__, while excluding all private ones

I think for Sphinx to work efficiently, we probably need b), either everywhere where we import functions that we want to make publicly availiable (when autosummary_imported_members = False), or everywhere where we use external modules (keras, scikit-learn, ...) with autosummary_imported_members = True.

One problem I encountered when trying this out is Python's peculiar behavior with relative imports. When doing

from .transform import Transform

we would expect that one variable is added to the namespace, Transform. But it turns out two objects are added: transform and Transform. The reason for this is explained in this thread.
For us, this means that without explicitly specifying __all__ or renaming transform to _transform, both are detected as public, even though our intention was to only make Transform publicly available.

We can do b) semi-automatically, by populating __all__ by a function like the following:

import inspect


def _add_imports_to_all(exclude=None):
    """Get global dict from stack."""
    exclude = exclude or []
    calling_module = inspect.stack()[1]
    local_stack = calling_module[0]
    global_vars = local_stack.f_globals
    if "__all__" not in global_vars:
        global_vars["__all__"] = []
    all_var = global_vars["__all__"]
    global_vars["__all__"] = [
        e for e in list(set(all_var).union(set(global_vars.keys()))) if e not in exclude and not e.startswith("_")
    ]

This would allow us to exclude unwanted relative imports (or, by modifying it a bit, only to add functions and classes, while manually specifying modules we want to add). If we also have a), we can exclude them automatically, as they start with an underscore.

Both solutions require a larger one-off effort, but have a comparatively low maintenance cost afterwards. As a) requires larger changes to the codebase, it would be beneficial to time the changes well to avoid large merge conflicts.

This issue is a bit technical, as both Python and Sphinx show somewhat peculiar behavior. If you have any questions, please send them so I can clarify if necessary. Also, if you have ideas for easier solutions, or think the effort is not worth the gains, please also share those considerations.

The text was updated successfully, but these errors were encountered:

paul-buerkner · 2025-01-02T14:07:52Z

Thank you for thinking about this issue. To my naive eye, both solutions look sensible. Which of the two solutions would @stefanradev93 and @LarsKue prefer?

vpratz · 2025-02-10T11:13:20Z

@stefanradev93 @LarsKue Bumping this, as we are nearing a release and this is relevant for having good public API documentation.

I think option a) would be cleanest, but would be a larger one-off effort to adjust all filenames and imports. After that, it would be cheap to maintain. I believe if we would want to make that change, now would be the best time to do so.
Option b), either manual or with the function mentioned above would be a little less clean, but would mainly require changing the public-facing subset of the ~40 __init__.py files.

I would offer to put some time in to make this happen. If we opt for a), it probably would be good to join forces to get it done in a day or two. What do you think?

LarsKue · 2025-02-10T11:30:55Z

I will have to have a closer look at how we generate docs and which modules or classes might be considered clutter (and thus should be private). I will prioritize this.

However, as a counter-argument, why should your example Transform be private? Either way of importing seems fine to me, and we already follow your suggestion a) to some extent.

vpratz · 2025-02-10T12:24:29Z

Thanks for taking a look. I did not mean that Transform should be private, but transform. This example is a bit outdated, so maybe let's take an example from the current code: In bayesflow/networks/__init__.py, we only want the classes to be public, but not the modules we import them from. For example the statement from .mlp import MLP will not only make MLP public, but mlp as well, which is not intended. So for a) we would rename mlp to _mlp, if the user should not access the module itself.
This is a quirk of the way Python's relative imports work, and why all of this is necessary in the first place.

LarsKue · 2025-02-10T12:54:44Z

I don't think having the modules public is necessarily an issue, unless it clutters the docs.

Users are still responsible for what they import, and it is unlikely that there will be significant confusion if we keep following naming conventions like CamelCase for classes.

vpratz · 2025-02-10T13:05:59Z

In my opinion it clutters the docs (you can take a look by checking out the gh-pages-dev branch), and it breaks the separation between the stable public API and internals, which might inhibit us to restructure the internals in the future if needed. As a user, I also find it somewhat confusing if I can import classes from multiple places ("What's the difference between bf.networks.MLP and bf.networks.mlp.MLP, and why are there two?"), which we can also avoid by hiding those modules.

vpratz mentioned this issue Jan 2, 2025

Building docs errors because INSTALL.rst no longer exists #280

Closed

paul-buerkner added documentation Improvements or additions to documentation refactoring Some code shall be redesigned discussion Discuss a topic or question not necessarily with a clear output in mind. labels Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate private from public API: Underscore convention or `all` #290

Separate private from public API: Underscore convention or `all` #290

vpratz commented Dec 28, 2024

paul-buerkner commented Jan 2, 2025

vpratz commented Feb 10, 2025

LarsKue commented Feb 10, 2025

vpratz commented Feb 10, 2025

LarsKue commented Feb 10, 2025

vpratz commented Feb 10, 2025

Separate private from public API: Underscore convention or __all__ #290

Separate private from public API: Underscore convention or __all__ #290

Comments

vpratz commented Dec 28, 2024

paul-buerkner commented Jan 2, 2025

vpratz commented Feb 10, 2025

LarsKue commented Feb 10, 2025

vpratz commented Feb 10, 2025

LarsKue commented Feb 10, 2025

vpratz commented Feb 10, 2025

Separate private from public API: Underscore convention or `all` #290

Separate private from public API: Underscore convention or `all` #290