[Bug]: bad inference result after gptq using llm-compressor on Qwen2.5-72b model

### ⚙️ Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

I want to quantize Qwen2.5-72b model using llm-compressor. After quantizing the model using config
```
recipe = GPTQModifier(ignore=["lm_head"], config_groups={
        "group_0": QuantizationScheme(
            targets=["Linear"],
            weights=QuantizationArgs(num_bits=4, strategy="group", group_size=64),
        )
    },
)
```

data processing:
```
DATASET_ID = "/workspace/dataset/quant/qwen2-72b-chat/common_26000_plus_3000lc_extracted_3000_4096.jsonl"
DATASET_SPLIT = "train"

# Select number of samples. 512 samples is a good place to start.
# Increasing the number of samples can improve accuracy.
NUM_CALIBRATION_SAMPLES = 300
MAX_SEQUENCE_LENGTH = 4096

# Load dataset and preprocess.
ds = load_dataset("json", data_files=DATASET_ID, split=f"{DATASET_SPLIT}[:{NUM_CALIBRATION_SAMPLES}]")
# ds = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft")
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
# ds = ds.shuffle(seed=42)


def preprocess(example):
    return {
        "text": tokenizer.apply_chat_template(
            example["messages"],
            tokenize=False,
            add_generation_prompt=True
        )
    }


ds = ds.map(preprocess)


# Tokenize inputs.
def tokenize(sample):
    return tokenizer(
        sample["text"],
        padding=False,
        max_length=MAX_SEQUENCE_LENGTH,
        truncation=True,
        add_special_tokens=False,
    )


ds = ds.map(tokenize, remove_columns=ds.column_names)
```
The inference result is always strange when the output token length are longer than 400. The result is like this, normal output followed by abnormal output: 

<img width="1271" height="2034" alt="Image" src="https://github.com/user-attachments/assets/58eb4527-ca50-4229-9074-cf28dd766361" />
If i use auto-gptq, this problem will not exist, the result is always normal. Any idea on this ?

### 🛠️ Steps to reproduce

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: bad inference result after gptq using llm-compressor on Qwen2.5-72b model #1875

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: bad inference result after gptq using llm-compressor on Qwen2.5-72b model #1875

Description

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions