Skip to content

[Question] scale for pos_embed in Halo and Bottleneck attention #912

Answered by rwightman
leondgarse asked this question in Q&A
Discussion options

You must be logged in to vote

@leondgarse I've just finished two runs that compare the two, same h-params, seed, etc just the scale_pos_embed toggled.

In one run, for a haloregnetz_b model the end result was within the run to run noise 81.03 (scale_pos_embed=False), vs 81.04 (scale_pos_embed=True).

Next one was a re-run of eca_botnext26ts_256, here I see closer to your results, the original config False edges out the scale_pos_embed=True by a small amount 79.27 vs 79.13.

I will leave it at False and probably won't revisit anytime soon as it seems at best slightly better, at worst, the same.

Replies: 4 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@leondgarse
Comment options

Answer selected by leondgarse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants