Skip to content

Commit 763f9d7

Browse files
authored
the fix replicates biases too if they exist (e.g. Qwen) (#328)
The fix takes care of replicating KV heads for models that have biases in addition to weights (such as Qwen family). The KV replication doubles the throughput for Qwen/Qwen2.5-1.5B, that has 2-KV, if compiled with TS4. The script has been successfully tested for Qwen/Qwen2.5-1.5B and meta-llama/Llama-3.2-1B-Instruct. Signed-off-by: quic-morteza <[email protected]>
1 parent d7a2772 commit 763f9d7

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

scripts/replicate_kv_head/replicate_kv_heads.py

+4
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ def duplicate_weights_for_linear_layer(
6363
layer.weight.data = torch.repeat_interleave(
6464
layer.weight.data.view(orig_kv_heads, head_dim, hidden_size), repeat, 0
6565
).view(new_kv_heads * head_dim, hidden_size)
66+
if layer.bias is not None:
67+
layer.bias.data = torch.repeat_interleave(layer.bias.data.view(orig_kv_heads, head_dim), repeat, 0).view(
68+
new_kv_heads * head_dim
69+
)
6670

6771

6872
def main(args):

0 commit comments

Comments
 (0)