You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Amir came into the office hour, and presented a superdataset, into which he accidentally saved ~380k files with a total disk space usage of multiple TB. The superdataset became painfully slow in response. He inquired how to get the data into a subdataset instead of having it in the superdataset directly. We pointed him to https://knowledge-base.psychoinformatics.de/kbi/0013/index.html and advised to split his directory (era_5) into year-wise subdatasets.
Because the era5 data was added as the most recent change, @mih pointed to a simpler alternative that we exercised in today's office hour, using merely a cp with dereferencing and hardlinking:
mkdir era5_sub
cd era5_sub
mkdir 1970
cp -v -l -L -R /p/largedata2/detectdata/CentralDB/era5/1970 .
datalad create --force
datalad save -m "new era5 1970" .
The dereference and hardlinking does not lead to a doubling of space! The inodes are the same:
ls -i era5/1979/1970_01/<firstfile>
ls -i -L era5_sub/1970_01/<firstfile>
-> returns same inode!
then in era5 the following still needs to happen:
git annex unused
git annex dropunused all
We should consider to write this up as a KBI, too, and link it to the one on splitting datasets (0013)
Origin: Office hour
Amir came into the office hour, and presented a superdataset, into which he accidentally saved ~380k files with a total disk space usage of multiple TB. The superdataset became painfully slow in response. He inquired how to get the data into a subdataset instead of having it in the superdataset directly. We pointed him to https://knowledge-base.psychoinformatics.de/kbi/0013/index.html and advised to split his directory (era_5) into year-wise subdatasets.
This support event is mostly documenting the need for a command to split datasets similar to how https://knowledge-base.psychoinformatics.de/kbi/0013/index.html outlines, but with DataLad tooling for ease of use.
TODO (not necessarily to be performed in this order)
The text was updated successfully, but these errors were encountered: