Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

模型并行时加载checkpoint导致word embedding size不匹配 #3

Open
Billijk opened this issue Aug 13, 2021 · 1 comment
Open

Comments

@Billijk
Copy link

Billijk commented Aug 13, 2021

作者你好,

我在尝试使用change_mp.py将checkpoint拆分之后使用模型并行,但是在加载模型时提示word embedding大小不匹配。读过代码之后发现代码会在加载模型时将词表大小pad到某个数的整数倍(以提高计算效率),这个数是args.make_vocab_size_divisible_by * mpu.get_model_parallel_world_size(),因此MP改变时词表大小也会改变,导致无法正常加载模型参数。

before = num_tokens
after = before
multiple = args.make_vocab_size_divisible_by * \
mpu.get_model_parallel_world_size()
while (after % multiple) != 0:
after += 1
print_rank_0('> padded vocab (size: {}) with {} dummy '
'tokens (new size: {})'.format(
before, after - before, after))

一个temporary fix是将这里671行的multiple变量固定成args.make_vocab_size_divisible_by

@makeme-zgz
Copy link

请问你成功地进行finetune或者pretrain了吗?@Billijk 我这边通过下载链接得到的checkpoint在加载的时候会有runtime error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants