Skip to content

Conversation

@g-antonello
Copy link
Collaborator

Description of the usage is in the file added. In brief my idea is that every time you generate a new data version, you also run a couple of extra functions on the same parameters used to generate the file version. This should be enough to generate a md5sum object to then compare when loading.

A data loading example is also shown in the same script.

Overall, these functions could be implemented in a fancier way in a data generation pipeline, but even so they should speed data loading by 5x at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants