Skip to content

Conversation

jl-wynen
Copy link
Member

@jl-wynen jl-wynen commented Sep 29, 2025

This allows overriding the get_path function for data files to use a local file instead of downloading from pooch. Simply define the env var SCIPP_OVERRIDE_DATA_DIR=/path/to/data and it will return paths in that folder instead of downloading anything. The code in downstream packages only needs to be updated and use the new make_registry function.

The override folder must have the same layout as the http server. I achieved that by symlinking /dmsc/codeshelf/ci/ess to /nfs/www/html/groups/scipp/ess. So any files that we use in any ess.* package should be accessible automatically.

This is a binary setting, either all files are downloaded or all files are accessed locally. I did this to indicate to us if we forget to provide a file locally and it downloads the file instead which can lead to (flaky) timeouts.

See https://git.esss.dk/dram/code-shelf/code-shelf-template/-/merge_requests/7 for how it can be used on gitlab.

I tested this on a GitLab runner. Unfortunately, I could not get it to work automatically with a dev build of essreduce because I had conflicts between conda and pip packages. But I could get it to work in an interactive session with proper pip install hacks.

@jl-wynen jl-wynen requested a review from nvaytet September 29, 2025 10:56

_bifrost_registry = Registry(
instrument='bifrost',
_bifrost_registry = make_registry(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of confused as to why we have these registries for specific instruments in here.
I did not actually know we had them.

As far as I can see, they are only used for the unit tests?
Should we just make something in the test folder?
I think each technique sub-package has its own data registry which does not need the registries here?


def _import_pooch() -> Any:
try:
import pooch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can using lazy_loader help not having to have these try/except blocks, just import pooch at the top, and only import it when needed?

:
Either a :class:`PoochRegistry` or :class:`LocalRegistry`.
"""
if (override := os.environ.get(_LOCAL_REGISTRY_ENV_VAR)) is not None:
Copy link
Member

@nvaytet nvaytet Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you wrote in the PR description, this is all or nothing. Either all files are local, or all files are from pooch.
I'm wondering if this will be impractical. I had imagined there would be some sort of priority: first check if a file is found locally, if not, try pooch?

In addition, does the current approach mean that the environment variable needs to be set before we import packages? If so, it would be nice if order did not matter; that you could set the variable at any time and it would pick it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants