Skip to content

What is the reason for this call to synchronize? #1388

Answered by rwightman
mitchellnw asked this question in Q&A
Discussion options

You must be logged in to vote

@mitchellnw it's not necessary in this train script and could be removed since the only use of the output in logging is via an item (implicit synchronization), there might be a possible issue with the loss reduction for distributed being unreliable w/o? That should be checkd.

Removing it could slightu increase the throughput but would have to measure from start -> end of epoch and not rely on the avg_meter as it'd potentially make the batch-to-bach timing unreliable...

it is necessary in the bits_and_tpu branch where taking output of model and accumulating in another device tensor appears to cause a race:
https://github.com/rwightman/pytorch-image-models/blob/bits_and_tpu/train.py#L777-L787

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@mitchellnw
Comment options

@mitchellnw
Comment options

@rwightman
Comment options

@rwightman
Comment options

@mitchellnw
Comment options

Answer selected by mitchellnw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants