You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I find that Deberta-v3 uses relative-position embedding so that it can takes in larger context compared to traditional BERT. Have you tried to pretrain deberta-v3 by 1024 or larger?
If I need to pretrain deberta-v3 from the scratch using a larger context length (e.g., 1024), are there any modification I should make besides the training script?
Thanks for any kind help!
The text was updated successfully, but these errors were encountered:
Hi! I find that Deberta-v3 uses relative-position embedding so that it can takes in larger context compared to traditional BERT. Have you tried to pretrain deberta-v3 by 1024 or larger?
If I need to pretrain deberta-v3 from the scratch using a larger context length (e.g., 1024), are there any modification I should make besides the training script?
Thanks for any kind help!
The text was updated successfully, but these errors were encountered: