Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better S3 Support for MagViews #1278

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Better S3 Support for MagViews #1278

wants to merge 7 commits into from

Conversation

Tobias314
Copy link

@Tobias314 Tobias314 commented Mar 25, 2025

Description:

  • Context is creating a "downsamped" version (dataset with original + downsampled mags) of an original dataset without modifying the original dataset but by creating a "shallow" copy instead
  • The underlying problem is that I want to create a new dataset B referencing Mag1 from another (remote) S3 dataset A and then adding downsampled Mag2, Mag4, Mag8, ... to dataset B without modifying A
  • Previous code had logic which inferred layer.path from the paths of the layer's mags, which leads to problems for layers with both remote and local mags
  • Proposed solution defines layer.path to always be dataset_path / layer.name

Todos:

Make sure to delete unnecessary points or to check all before merging:

  • Updated Changelog
  • Updated Documentation
  • Added / Updated Tests
  • Considered adding this to the Examples

@Tobias314 Tobias314 self-assigned this Apr 7, 2025
)
)
or (endpoint_url := path.storage_options.get("endpoint_url", None))
or (endpoint_url := os.environ.get("S3_ENDPOINT_URL", None))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check whether this is required, would be better to always infer endpoint_url from UPath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rely on the UPath

@Tobias314 Tobias314 requested review from markbader and normanrz April 7, 2025 17:15
@Tobias314
Copy link
Author

@normanrz @markbader I think we should discuss this PR. The proposed solution changes/redefines the behavior of layer.path which probably is a breaking change (also visible by failing tests). However, since layer.path is not specified in the datasource_properties.json, it is questionable whether there is a good way to define it for layers with both local and remote mags...

Maybe we should discuss what the expected behavior should be and how to create a new dataset layer with some (not all) of its mags pointing to mags of other (remote) datasets?

Copy link
Member

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am not opposed to this change. We just need to make sure it works in both wklibs and vx.

)
)
or (endpoint_url := path.storage_options.get("endpoint_url", None))
or (endpoint_url := os.environ.get("S3_ENDPOINT_URL", None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rely on the UPath

@@ -780,6 +787,7 @@ def create(cls, path: Path, array_info: ArrayInfo) -> "Zarr3Array":
],
},
"create": True,
"open": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm?

if maybe_layer_path and is_remote
else self.dataset.path / self.name
)
return self.dataset.path / self.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the usages of Layer.path in wklibs and vx?

self._properties.mags.append(
MagViewProperties(
mag=mag_view.mag,
path=str(mag_view.path),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just casting str(..) probably doesn't work, because wk expects a different syntax for S3 paths, i.e. s3://{endpoint_url/{bucket_name}/{prefix} instead of s3://{bucket_name}/{prefix}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants