-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: run-procedure for BIDS dataset configuration #114
Comments
that's the 'standard' way to approach a BIDS dataset, make sense to see root directory info (=git) while the rest goes into the annex (also make easy catalog :-)) 👍🏻 |
Generally, I think it does make sense, but the problem lies in
Editing something to be so, implies that there was a state before that, which must never have been |
Fair point, although that problem/challenge exists whether one applies a run-procedure or not. It is something that the people managing the data would need to consider in any case when they turn it into a datalad dataset. |
Yes, but a default that annexes everything doesn't lead you in a trap. Public and restricted content can still be separated in terms of storage. May be a little less convenient, but you don't get in a situation that is really hard to fix. To be fair: The existence of a procedure isn't exactly a default. I'm a bit worried though, that it goes the way of |
I think this is a sane approach, with two caveats (though keep in mind that my knowledge of BIDS spec might be not up to date):
|
My biggest concern with this approach is when participants need to be removed. If the |
I agree, I would be hesitant to put anything other than a README and a LICENSE into git by default. Code is another candidate, but only if the file identifiers are at minimum pseudonymized. |
and There is always a "hard to strike" balance in what to put into git and what into git-annex. For heudiconv all .json and .tsv go into |
I'm wondering if it would be useful to add a run-procedure to this extension to configure BIDS+datalad datasets such that all files in the root BIDS directory are committed to
git
while all the rest of the files, irrespective of type, go to the annex?I'm thinking of use-cases related to distributed dataset-level metadata extraction and catalog generation. Data in the annex (typically all subfolders of the root BIDS directory) would need to be protected because of data privacy concerns, while data in the root directory (
participants.tsv
,dataset_description.json
, any json sidecar files defined at the root level, any additional dataset-level metadata added at root level) are typically considered non-sensitive or have specifically been edited to be so, and can therefore be considered safe to commit to git.Configuring a dataset like that (as opposed to annexing all files in the dataset) would allow sufficient metadata extraction on any clone without requiring access to the annex.
The run procedure would add something like this to
.gitattributes
:The procedure (let's call it
rootfiles2git
) would be available in this extension because it seems (to me) like it could be generally applicable to BIDS datasets collected in the EU (because of GDPR).WDYT @yarikoptic @bpoldrack @mslw @CPernet @loj
The text was updated successfully, but these errors were encountered: