Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cfg_* procedure(s) for preferable .gitattributes for various known dataset types #71

Open
yarikoptic opened this issue Jul 17, 2019 · 0 comments

Comments

@yarikoptic
Copy link
Member

ATM we have cfg_bids which

  • sets up .gitattributes to have some files directly in git
  • sets up metadata extraction configuration

But besides BIDS I keep running into the need to establish .gitattributes for following types, where I think following, analogous to BIDS one, should be done

.feat and .gfeat FSL outputs

  • .gitattributes - may be use a cfg_text2git?

on a sample .gfeat directory of 9GB, with a regular cfg_text2git I ended up with 260KB .git/objects

that allowed to quickly install that dataset elsewhere, datalad get **/*.png

  • metadata
    • datalad: eventually might configure the extractor
    • git-annex: we might like to annotate with annex metadata file types may be so on shells without ** ppl could quickly get all needed supplementary data files to browse the results

fmriprep

  • .gitattributes

    I had

*.md annex.largefiles=nothing
*.html annex.largefiles=nothing
*.json annex.largefiles=nothing
CITATION.* annex.largefiles=(not(mimetype=text/*))

which resulted in 32MB .git/objects for ~500GB dataset (~250 subjects).

  • metadata
    • configure extractors (nifti1, bids, may be more when support FreeSurfer etc)
    • interesting use case since BIDS(-derivative) dataset is not at the top of this dataset which has two directories -- fmriprep and freesurfer, so bids extractor should be informed to look into fmrieprep/

HOWTO

Pretty much all those scenarios are very similar and just require only slightly different specification. I see two implementation possibilities

breed cfg_* scripts

  • extract common code from cfg_bids into some cfg_common.py helper
  • reuse from within individual cfg_bids, cfg_feat, cfg_fmriprep

create (optionally parametrized) cfg_neuroimaging_dataset

which would sense (or "force" via explicit parameter) the type of the dataset and act accordingly (if can figure out, crash if fails and no explicit parameter such as "bids") is specified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant