Skip to content

Implement locking for dataset creation in Globus uploader #446

Open
@craig-willis

Description

@craig-willis

Because Clowder allows multiple datasets to have the same name, we've frequently run into a problem with duplicate datasets during the Globus upload process. This results in downstream problems, such as extractors not triggering because of incomplete data if files are split across datasets.

We've discussed implementing a method in the Clowder API -- getOrCreateDataset or similar -- that would return an existing ID or create if it didn't exist, but have had pushback from the Clowder team since it would require locking in Mongo.

An alternative is to implement locking in the uploader itself either via Postgres or another package such as https://github.com/vaidik/sherlock.

Completion criteria:

  • Implement distributed locking mechanism in uploader
  • Update documentation

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions