You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 18, 2025. It is now read-only.
I found that different hyper-parameters (number of layers, dimension, etc.) are used for different models.
Can you clarify how the baselines are compared?
@mlpen We had extensive hp search for every single model to make sure that we have best possible results from each. Especially for the CIFAR task, given that the results of different models are close, we wanted to make sure we have a rather large grid for searching the hp for each model separately. So you can see different values for number of layers, number of heads, etc. And basically we prioritized getting best possible result over keeping the number of trainale parameters similar across models.
Hope this answers your question. Let us know if you have any issue reproducing the result or if by any chance you ended up with an hp for any of these models that gives you better results than what we reported in the paper.
We had extensive hp search for every single model to make sure that we have best possible results from each.
Was the hyperparameters from the hp search used to make Table 1, or was the hp search done after Table 1? I'm confused because article says all Transformer models used the same fixed hyperparameters, but the result of hp search gave different hyperparameters
" The large search space motivates us to follow a set of fixed hyperparameters (number of layers, heads, embedding dimensions, etc) for all models. "
Hi,
I found that different hyper-parameters (number of layers, dimension, etc.) are used for different models.
Can you clarify how the baselines are compared?
For example,
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/longformer_base.py
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/performer_base.py
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/reformer_base.py
The text was updated successfully, but these errors were encountered: