Open
Description
Because Clowder allows multiple datasets to have the same name, we've frequently run into a problem with duplicate datasets during the Globus upload process. This results in downstream problems, such as extractors not triggering because of incomplete data if files are split across datasets.
We've discussed implementing a method in the Clowder API -- getOrCreateDataset or similar -- that would return an existing ID or create if it didn't exist, but have had pushback from the Clowder team since it would require locking in Mongo.
An alternative is to implement locking in the uploader itself either via Postgres or another package such as https://github.com/vaidik/sherlock.
Completion criteria:
- Implement distributed locking mechanism in uploader
- Update documentation