[Question] scale for pos_embed in Halo and Bottleneck attention #912

leondgarse · 2021-10-12T06:45:52Z

leondgarse
Oct 12, 2021

Just noticed a small difference in timm with halonet-pytorch author code and botnet author gist code. That both of them calculating positional embedding by something like:

qquery= self.to_q(q_inp)
query *= scale
position = relative_position_logits(query)

But here in timm bottleneck_attn.py and halo_attn.py, I think scale is only multiplied to attention, but not to query used to calculate relative_position:

query = self.q(x)
attention = (query @ key.transpose(-1, -2)) * self.scale
attention = attention + self.pos_embed(query)

I did some basic tests using HaloNet26T + cifar10 + input shape (160, 160) comparing these 2 implementations, seems that current timm one performs better:

Result using AdamW or SGD + step lr decay is also similar with this one. Take a look at values in trained model, that:

Their trained weights height_rel and width_rel are similar, both in range about (-0.7, 0.7).
The calculated positional_embeddings in forward are different, timm without scale version is ~10 times larger, and the attention value added with this positional_embedding is also ~10 times larger.

This behavior may not matter, model may fit its own weights for it. I'm just wondering if any backgrounds for this?

Answered by rwightman

Oct 20, 2021

@leondgarse I've just finished two runs that compare the two, same h-params, seed, etc just the scale_pos_embed toggled.

In one run, for a haloregnetz_b model the end result was within the run to run noise 81.03 (scale_pos_embed=False), vs 81.04 (scale_pos_embed=True).

Next one was a re-run of eca_botnext26ts_256, here I see closer to your results, the original config False edges out the scale_pos_embed=True by a small amount 79.27 vs 79.13.

I will leave it at False and probably won't revisit anytime soon as it seems at best slightly better, at worst, the same.

View full answer

rwightman · 2021-10-12T21:35:57Z

rwightman
Oct 12, 2021
Maintainer

@leondgarse you are correct that as per the botnet gist and more importantly, the paper that this form of rel posembed was based on (https://arxiv.org/abs/1904.09925) I should have scaled q for the relative position as well. It was an oversight, but it worked well and seems stable enough. I've thought about fixing it, or at least providing an option to apply scale to both but have yet to do that. Thanks for the comparison table. Perhaps I should at least add a comment.

0 replies

rwightman · 2021-10-12T22:41:26Z

rwightman
Oct 12, 2021
Maintainer

@leondgarse in 02daf2a I added a bool flag, need to investigate further before making any recommendations or changing defaults.

0 replies

leondgarse · 2021-10-14T02:33:00Z

leondgarse
Oct 14, 2021
Author

I took another test using botnet26t with same training configure as halonet26t. For this toy scenario, the gap between with or wo scale_pos_embed seems larger. I actually used to expect they should be similar..

I will use your scale_pos_embed=False architecture as default then.

0 replies

rwightman · 2021-10-20T06:05:11Z

rwightman
Oct 20, 2021
Maintainer

@leondgarse I've just finished two runs that compare the two, same h-params, seed, etc just the scale_pos_embed toggled.

In one run, for a haloregnetz_b model the end result was within the run to run noise 81.03 (scale_pos_embed=False), vs 81.04 (scale_pos_embed=True).

Next one was a re-run of eca_botnext26ts_256, here I see closer to your results, the original config False edges out the scale_pos_embed=True by a small amount 79.27 vs 79.13.

I will leave it at False and probably won't revisit anytime soon as it seems at best slightly better, at worst, the same.

1 reply

leondgarse Oct 20, 2021
Author

Thanks for your time. These results really help. I'll keep following your step. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Question] scale for pos_embed in Halo and Bottleneck attention #912

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Question] scale for pos_embed in Halo and Bottleneck attention #912

Uh oh!

Uh oh!

leondgarse Oct 12, 2021

Replies: 4 comments · 1 reply

Uh oh!

Uh oh!

rwightman Oct 12, 2021 Maintainer

Uh oh!

rwightman Oct 12, 2021 Maintainer

Uh oh!

leondgarse Oct 14, 2021 Author

Uh oh!

rwightman Oct 20, 2021 Maintainer

Uh oh!

leondgarse Oct 20, 2021 Author

leondgarse
Oct 12, 2021

Replies: 4 comments 1 reply

rwightman
Oct 12, 2021
Maintainer

rwightman
Oct 12, 2021
Maintainer

leondgarse
Oct 14, 2021
Author

rwightman
Oct 20, 2021
Maintainer

leondgarse Oct 20, 2021
Author