Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embargo end date should not be a user populated field #2116

Open
jjnesbitt opened this issue Dec 18, 2024 · 10 comments
Open

Embargo end date should not be a user populated field #2116

jjnesbitt opened this issue Dec 18, 2024 · 10 comments

Comments

@jjnesbitt
Copy link
Member

Currently, the embargoedUntil field in the asset metadata is required to be set if the dandiset is embargoed. However, this is often not set, and as a result, people are left with validation errors. Here is where that error is generated in dandi-schema.

IMO this should not be a user filled field, and should instead be automatically populated by the archive. When a user first creates an embargoed dandiset, we could require that the user set an embargo end date, with a maximum of 1 year from the current date (as per NIH requirements). This would then automatically populate that field in the schema, and when that time comes, we would automatically unembargo that dandiset. If the user unembargos before that previously set unembargo end date, we can update it to when it was actually unembargoed.

Thoughts @satra @yarikoptic?

@satra
Copy link
Member

satra commented Dec 18, 2024

@jjnesbitt - the nih requirement is more complicated: end of award or publication of paper whichever is earlier. we could ask the user for end of award date when they create an embargoed dataset. and use that to set any asset. i agree that this shouldn't be a user filled element, and should be updated when unembargoed.

@jjnesbitt
Copy link
Member Author

@jjnesbitt - the nih requirement is more complicated: end of award or publication of paper whichever is earlier. we could ask the user for end of award date when they create an embargoed dataset. and use that to set any asset. i agree that this shouldn't be a user filled element, and should be updated when unembargoed.

Thanks for the clarification. Is there any concern about verifying this end of award date? For example, a user could set an "end of award" date 10 years in the future, just because they don't feel like dealing with it.

@satra
Copy link
Member

satra commented Dec 19, 2024

no award is typically more than 5 years. so we should put in that sanity check. we can perhaps verify and update as an admin component. when that happens, i suspect some command may need to be run to update everything. for nih, awards we should really use the reporter.nih.gov API to pull in award name and project end date. however, all awards will not be nih. hence this may require a bit of work to restructure the registration (people have also complained that we only say nih on that dandiset embargo registration page).

@kabilar
Copy link
Member

kabilar commented Dec 20, 2024

Thanks @jjnesbitt.

...we could require that the user set an embargo end date, with a maximum of 1 year from the current date (as per NIH requirements).

I might be missing some context but why can the embargo end date only be a maximum of 1 year from the current date?

...when that time comes, we would automatically unembargo that dandiset.

If the Archive is to automatically unembargo a Dandiset, I would suggest that we should send out several warning emails (6 months prior to unembargo date, 1 month prior, after unembargo), and provide the option to extend the unembargo date if the grant was extended. It is fairly common for the grant end date to be extended by a year.

@jjnesbitt
Copy link
Member Author

I might be missing some context but why can the embargo end date only be a maximum of 1 year from the current date?

That's what I thought was the NIH requirements. Satra has already pointed out that there's more to it than that, so I must've been mistaken about that. My point was, some maximum so that embargoed dandisets aren't embargoed indefinitely.

If the Archive is to automatically unembargo a Dandiset, I would suggest that we should send out several warning emails (6 months prior to unembargo date, 1 month prior, after unembargo), and provide the option to extend the unembargo date if the grant was extended. It is fairly common for the grant end date to be extended by a year.

I agree. I suppose the question is then, do we care to verify the embargo end date, and if so, how? I like the idea of the user that's creating a dandiset providing an award number, and us automatically pulling in that info (including end date). However, I'm not sure how many non NIH awards we'll need to deal with.

@kabilar
Copy link
Member

kabilar commented Dec 20, 2024

If the Archive is to automatically unembargo a Dandiset, I would suggest that we should send out several warning emails (6 months prior to unembargo date, 1 month prior, after unembargo), and provide the option to extend the unembargo date if the grant was extended. It is fairly common for the grant end date to be extended by a year.

I agree. I suppose the question is then, do we care to verify the embargo end date, and if so, how? I like the idea of the user that's creating a dandiset providing an award number, and us automatically pulling in that info (including end date). However, I'm not sure how many non NIH awards we'll need to deal with.

I think that it would be good to verify the project end date. The NIH Reporter API allows one to search for projects and returns the project end date. I'm not sure how quickly it is updated if the grant is extended. I'm also not sure how many non NIH awards we'll need to deal with, but perhaps we can just not verify the non NIH awards for now and address those cases in the future based on user demand?

@satra
Copy link
Member

satra commented Dec 20, 2024

on the UI side for an embargoed dandiset registration:

if NIH award: allow searching info from reporter api and allow setting to earlier date if needed
if not NIH award: [enter source of funding], set embargo end date to +1 year

on the admin side:

  • create an internal display of embargoed dandisets, end dates, verification flag

@jjnesbitt
Copy link
Member Author

jjnesbitt commented Dec 20, 2024

  • create an internal display of embargoed dandisets, end dates, verification flag

That seems like a pretty good set of requirements. What would the verification flag represent exactly? If the award number (and thus, end date) has been verified?

@satra
Copy link
Member

satra commented Dec 20, 2024

What would the verification flag represent exactly?

if NIH auto-extracted, it could be set to True
if non NIH, a human (admin) may have to verify and set the flag.

@waxlamp
Copy link
Member

waxlamp commented Jan 7, 2025

Due to the trouble this is causing for end users in #2089, I think we should come up with a short term solution to re-enable usability while we tackle the deeper policy problems here. Here are three solution strategies I think we could pursue, in descending order of speed-to-deploy and ascending order of pain-to-implement:

  1. Set an end date of Jan 1, 1970 for all embargoed Dandisets. This is the UNIX epoch, which would serve as a good sentinel value that we are deferring this problem until we can implement the solutions below, or some other solution. We would update all existing embargoed Dandisets to have this end date, and ensure that all new embargoed Dandisets have it as well. Since we currently have no logic implemented for validating "past due" embargo statuses and no automated unembargo in place, this end date will not affect operations, other than removing the validation error message that currently afflicts all embargoed Dandisets.
  2. Create an admin work queue to manually set an end date on a case-by-case basis. This approach would resemble the current workflow for approving new user accounts. An email would go to the admins for every embargoed Dandiset that would link to an admin page to examine each Dandiset, present information on the reported NIH award number, and accept an end date with which to update the Dandiset and complete the embargo process.
  3. Automatically retrieve end dates from NIH Reporter. This would be a sort of gold standard, but I suspect it is trickier than it seems to properly implement. On top of that, the Reporter API may not be up to date, and any errors in the reported NIH award number would cause this to fail immediately. There is also the possibility that a malicious user could report the wrong award number and hijack another study's end date to fraudulently embargo data that should not be. All of these problems would seem to require a workflow more like (2) above.

As a practical matter, I think a quick solution that gives way to a better, slower solution would be valuable to prevent further pain to end users. That means implementing (1) and then getting serious about exploring whether (2), (3), or some other solution is the proper permanent solution.

@satra, @yarikoptic, @kabilar: thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants