Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable transfer types for record files. #91

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mesemus
Copy link

@mesemus mesemus commented Jan 19, 2025

Description

This pull request proposes refactoring of Invenio’s file transfer mechanism by making transfer types pluggable, moving beyond the existing fixed options (Local, Fetch, Remote). It also introduces a dedicated transfer field to contain all transfer-related metadata.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

* Introduces pluggable transfer types

* Adds support for multipart upload and remote transfer types

* Adds support for per-transfer type permissions

Co-authored-by: Mirek Simek <[email protected]>
@mesemus
Copy link
Author

mesemus commented Jan 19, 2025

Note: This pull request was originally proposed at inveniosoftware/rfcs#76 and has been relocated here to allow additional contributions from the CESNET team.

mesemus added a commit to oarepo/invenio-s3 that referenced this pull request Jan 19, 2025
* Implementation of multipart upload, as described in RFC 0072

* See inveniosoftware/rfcs#91

Co-authored-by: Mirek Simek <[email protected]>
mesemus added a commit to oarepo/docs-invenio-rdm that referenced this pull request Jan 19, 2025
* See inveniosoftware/rfcs#91 for details

Co-authored-by: Mirek Simek <[email protected]>
@mesemus
Copy link
Author

mesemus commented Jan 19, 2025

@mesemus mesemus marked this pull request as draft January 19, 2025 19:12
oarepo-bot pushed a commit to oarepo/invenio-s3 that referenced this pull request Jan 21, 2025
* Implementation of multipart upload, as described in RFC 0072

* See inveniosoftware/rfcs#91

Co-authored-by: Mirek Simek <[email protected]>
@mesemus
Copy link
Author

mesemus commented Feb 12, 2025

This PR is a continuation of #76

mirekys and others added 2 commits February 12, 2025 17:37
* Customization of files ui design
* Uppy.io files ui integration

Co-authored-by: Miroslav Bauer <[email protected]>
Copy link
Member

@slint slint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, and all concerns we've discussed have been addressed both on the backend and frontend. I'll let some more frontend-savvy folks to chime in for the use of window.invenio.files.

Thank you for putting the work in, looking very much forward to review/test the code!

Comment on lines +47 to +57
To make file transfers more consistent, we are introducing a new `transfer`
field to replace the current inconsistent usage of `storage_class`.

- In **invenio-files-rest**, `storage_class` indicates the file’s intended location or usage (e.g., `archive`, `standard`).
- In **invenio-records-resources**, it has been (mis)used to denote the transfer type (e.g., `local`, `fetch`, `remote`).

Going forward, the new `transfer` field will hold all transfer-related
information, while `storage_class` will continue to serve its original purpose
from `invenio-files-rest`.

The `transfer` field will be a dictionary with the following structure:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clarification of the difference (and original purpose) of the two fields!

Here we describe integration of [Uppy.io](https://uppy.io/) as a custom Files UI implementation addressing
use cases regarding reliable multi-part uploads.

<!-- TODO: move under inveniosoftware org? -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with us of course :)
I think there's some procedure to take care off for publishing on NPM.


- A slightly more complicated code.
- Some files UI components and Redux store actions that were private now need to be exported to allow reuse.
- Reliance on the global state (`window.invenio.files`) for registering the Files UI, along with an additional Webpack entry point that must be included in a specific order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the decoupling of where these objects come from by storing them in window.invenio.files.

On my first pass, I thought that swapping the loaded entrypoint via FILES_UI_ENTRYPOINT to be able to override is somewhat unconventional compared to our typical use of Overridable. In the end, I don't personally feel the difference is that big (both ways are just setting some global state), but probably @kpsherva and @zzacharo can chime in.

Copy link
Contributor

@kpsherva kpsherva Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be concerned here about increasing the maintenance cost - we are aiming here at supporting two file uploaders, while one of them, uppy, seems to be superior in terms of file upload handling and overall user experience. What would be the added value of keeping the default file uploader as an option and maintaining both? ping @ntarocco

Personally, I would try to keep the UI as simple as it gets. I wouldn't rely on global state, I would rather try to come up with some abstraction layer (class) which can handle different apiClient and service implementation(s).

I agree for the overridable, it is too complex use case to use it here. I would rather try to understand if we need to support two file uploaders instead of introducing another method of overriding/replacing components (even if it is as clever as the one you proposed :)) .

My overall tendency is simplification where it is possible, my proposal is to consider together having only one uploader.

thank you for the massive work and the amazing RFC, it is super clear and reads well!

Copy link

@mirekys mirekys Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand your concerns about the impact this has on code maintenance & complexity. It seemed quite a lot of work to me also, to make both UIs work while changing things in fileapi services around. If Uppy looks feasible to you as a replacement, without the option to fall back to the current lightweight UI, I'd be glad to update the RFC as to be a replacement of the current UI implementation. Just let me know @ntarocco @kpsherva.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpsherva the main motivation of developing this new uploader in parallel to the existing one was to make the development easier by using a feature flag, instead of long-lasting branches, and allow easier testing.
We cannot completely replace the uploader in production instances without running it for a while and gather enough knowledge and experience first: files upload is the critical feature of a repository.

I would limit the changes to any existing code as much as possible, and avoid using any Overiddable if possible. I would implement completely new and isolated code for this new uploader and, when enabled via a Python config, the new JS bundle should be injected instead of the existing uploader (if feasible).

The idea, again if possible, is to make these 2 swappable: test the new implementation for a defined amount of time, in different instances, and then switch when mature enough.

- A slightly more complicated code.
- Some files UI components and Redux store actions that were private now need to be exported to allow reuse.
- Reliance on the global state (`window.invenio.files`) for registering the Files UI, along with an additional Webpack entry point that must be included in a specific order.
- Need to create a Webpack entrypoint in site's webpack to be able to use any custom Files UI implementations, that are just NPM packages (like the Uppy.io integration above).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's intended or not, but I feel that the Uppy.io component and Webpack entrypoint should/can already live alongside the current file uploader component, as a first-class supported option for InvenioRDM.

Comment on lines +490 to +491
via `react-overridable` by mapping the desired component in `mapping.js`. That would avoid the need for an extra webpack entrypoint and using the global state altogether. However,
it remains unclear how site administrators could easily configure the correct services server-side (e.g., DepositFilesService, DepositFilesAPIClient), as they are purely JavaScript objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, often when we want to translate Flask/Python config to more dynamic JS functionality, we lose a lot of flexibility, and usually have to start double-keeping config-key-to-JS-object "registries".

#### UI workflow

Default dashboard state:
![Screenshot 2025-02-13 at 8 57 20](https://github.com/user-attachments/assets/85319625-7510-4482-b314-64eac835fc73)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look great, especially the auto-thumbnails :)

I feel there's a bit of information and functionality duplication with having both the tiles and table below.

I see the Uppy Dashboard components have decent customization, so I'm not that worried about being able to sync/combine the two for a good UX.


As an admin, I want to:
- Configure the default upload mechanism (TransferType) for new uploads.
- Configure which uploader UI should be used in the deposit form.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for very clear use cases, it reads well!

I am curious, what would be a motivation behind an admin wanting to change the UI of the file uploader? Do you mean in application runtime?

Copy link

@mirekys mirekys Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your review. This point comes from a discussion here: #86 (comment). I had an impression, that it would be useful for the admin to be able to choose whenever to keep the current simple UI, use the more powerful Uppy uploader, or to choose any other custom uploader UI implementation for a site. This is configured at runtime by setting, which webpack entrypoint should be used, but it must be included at build time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, by runtime I mean more "I go to the administration panel and I switch the file uploader as a setting from there, without redeploying the application" - but from the rest of the RFC I understand it is not the case, since you've mentioned it would be controlled by a config variable.
I think my confusion is coming from the fact that we call the actor "admin" in many cases, in how we are naming things it is not really distinguished from the "application maintainer" or "devops" :)

Copy link

@mirekys mirekys Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see :) I was writing this with "admin" meaning actually the devops guy, who configures invenio.cfg / runtime env config for a site. Changing this from the administration panel would pose additional challenge, but it's not impossible, given that all possible uploader bundles must be already built-in into the running site

[{
"key": "file.txt",
"transfer": {
"type": "R",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the amount of types won't explode in the future, causing naming clashes :)


We will also introduce a `TransferSchema` to handle the new `transfer` field.
It will allow using different schemas for each transfer type. To achieve this,
we will leverage `marshmallow-oneofschema`, which is already included in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find the usage of oneofschema a bit complex, so I would like to suggest documenting it well in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants