BugFix: Fix reshape error for llama swiftkv models (#432)

quic-shagun · web-flow · commit 61b14459da6e · 2025-06-25T09:40:04.000+05:30
This fixed the issue with higher BS compilation for SwiftKV models

```
Compiler command: ['/opt/qti-aic/exec/qaic-exec', '-aic-hw', '-aic-hw-version=2.0', '-m=/prj/qct/aisyssol_scratch/users/shagsood/quic_shagun/LlamaSwiftKVForCausalLM-a5879ebc0e59ab40/LlamaSwiftKVForCausal
LM.onnx', '-compile-only', '-retained-state', '-convert-to-fp16', '-aic-num-cores=16', '-network-specialization-config=/prj/qct/aisyssol_scratch/users/shagsood/quic_shagun/LlamaSwiftKVForCausalLM-a5879eb
c0e59ab40/qpc-60f86f912a187346/specializations.json', '-custom-IO-list-file=/prj/qct/aisyssol_scratch/users/shagsood/quic_shagun/LlamaSwiftKVForCausalLM-a5879ebc0e59ab40/qpc-60f86f912a187346/custom_io.ya
ml', '-mdp-load-partition-config=/prj/qct/aisyssol_scratch/users/shagsood/quic_shagun/LlamaSwiftKVForCausalLM-a5879ebc0e59ab40/qpc-60f86f912a187346/mdp_ts_4.json', '-aic-binary-dir=/prj/qct/aisyssol_scra
tch/users/shagsood/quic_shagun/LlamaSwiftKVForCausalLM-a5879ebc0e59ab40/qpc-60f86f912a187346/qpc']                                                                                                         
Compiler exitcode: 1                                                                                                                                                                                       
Compiler stderr:                                                                                                                                                                                           
QAIC_ERROR:                                                                                                                                                                                                
Error message:  [Operator-'/model/layers.16/self_attn/Reshape'] : Reshape: input shape (4, 4, 4096) and output shape (4, 1, 32, 128) have different number of elements (in 65536 vs. out 16384)            
Unable to AddNodesToGraphFromModel 
```

Tested with BS4. Able to compile now

Signed-off-by: quic-shagun &lt;quic_shagsood@quicinc.com&gt;
diff --git a/QEfficient/transformers/models/llama_swiftkv/modeling_llama_swiftkv.py b/QEfficient/transformers/models/llama_swiftkv/modeling_llama_swiftkv.py
@@ -371,8 +371,8 @@ def forward(
             hidden_states = orig_hidden_states[torch.arange(orig_hidden_states.shape[0]).reshape(-1, 1), last_pos_id, :]
             causal_mask = causal_mask[torch.arange(orig_hidden_states.shape[0]).reshape(-1, 1), :, last_pos_id, :]
         else:
-            hidden_states = orig_hidden_states[torch.arange(bsz), last_pos_id, :]
-            causal_mask = causal_mask[torch.arange(bsz), :, last_pos_id, :]
+            hidden_states = orig_hidden_states[torch.arange(bsz).reshape(-1, 1), last_pos_id, :]
+            causal_mask = causal_mask[torch.arange(bsz).reshape(-1, 1), :, last_pos_id, :]
 
         hidden_states, next_decoder_cache = self._run_swiftkv_layers(
             hidden_states, position_ids, past_key_values, causal_mask, batch_index