Skip to content

[Bug]: bad inference result after gptq using llm-compressor on Qwen2.5-72b model #1875

@yoonlee888

Description

@yoonlee888

⚙️ Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

I want to quantize Qwen2.5-72b model using llm-compressor. After quantizing the model using config

recipe = GPTQModifier(ignore=["lm_head"], config_groups={
        "group_0": QuantizationScheme(
            targets=["Linear"],
            weights=QuantizationArgs(num_bits=4, strategy="group", group_size=64),
        )
    },
)

data processing:

DATASET_ID = "/workspace/dataset/quant/qwen2-72b-chat/common_26000_plus_3000lc_extracted_3000_4096.jsonl"
DATASET_SPLIT = "train"

# Select number of samples. 512 samples is a good place to start.
# Increasing the number of samples can improve accuracy.
NUM_CALIBRATION_SAMPLES = 300
MAX_SEQUENCE_LENGTH = 4096

# Load dataset and preprocess.
ds = load_dataset("json", data_files=DATASET_ID, split=f"{DATASET_SPLIT}[:{NUM_CALIBRATION_SAMPLES}]")
# ds = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft")
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
# ds = ds.shuffle(seed=42)


def preprocess(example):
    return {
        "text": tokenizer.apply_chat_template(
            example["messages"],
            tokenize=False,
            add_generation_prompt=True
        )
    }


ds = ds.map(preprocess)


# Tokenize inputs.
def tokenize(sample):
    return tokenizer(
        sample["text"],
        padding=False,
        max_length=MAX_SEQUENCE_LENGTH,
        truncation=True,
        add_special_tokens=False,
    )


ds = ds.map(tokenize, remove_columns=ds.column_names)

The inference result is always strange when the output token length are longer than 400. The result is like this, normal output followed by abnormal output:

Image If i use auto-gptq, this problem will not exist, the result is always normal. Any idea on this ?

🛠️ Steps to reproduce

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggptqFor any PR / issue related to GPTQ supportwNa16Anything related to weight-only int-quantized support

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions