Replies: 1 comment 1 reply
-
@ydhongHIT looks like it is indeed wrong, but was a simpler mistake/fix. I left off the negatives on the transpose indices, I believe it should be Before the matmul, the dims are I'll see if this improves the training, bottleneck transformer was not working very well compared to halo and I hadn't found the time to analyse closely |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the https://github.com/rwightman/pytorch-image-models/blob/3f9959cdd28cb959980abf81fc4cf34f32e18399/timm/models/layers/bottleneck_attn.py#L125,
I think it should be ''attn_out = (attn_out @ v).transpose(1, 2).reshape(B, H, W, self.dim_out).permute(0, 3, 1, 2)".
Beta Was this translation helpful? Give feedback.
All reactions