-
Notifications
You must be signed in to change notification settings - Fork 282
Open
Labels
bugSomething isn't workingSomething isn't workinggptqFor any PR / issue related to GPTQ supportFor any PR / issue related to GPTQ supportwNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support
Description
⚙️ Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
I want to quantize Qwen2.5-72b model using llm-compressor. After quantizing the model using config
recipe = GPTQModifier(ignore=["lm_head"], config_groups={
"group_0": QuantizationScheme(
targets=["Linear"],
weights=QuantizationArgs(num_bits=4, strategy="group", group_size=64),
)
},
)
data processing:
DATASET_ID = "/workspace/dataset/quant/qwen2-72b-chat/common_26000_plus_3000lc_extracted_3000_4096.jsonl"
DATASET_SPLIT = "train"
# Select number of samples. 512 samples is a good place to start.
# Increasing the number of samples can improve accuracy.
NUM_CALIBRATION_SAMPLES = 300
MAX_SEQUENCE_LENGTH = 4096
# Load dataset and preprocess.
ds = load_dataset("json", data_files=DATASET_ID, split=f"{DATASET_SPLIT}[:{NUM_CALIBRATION_SAMPLES}]")
# ds = load_dataset("HuggingFaceH4/ultrachat_200k", split="train_sft")
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
# ds = ds.shuffle(seed=42)
def preprocess(example):
return {
"text": tokenizer.apply_chat_template(
example["messages"],
tokenize=False,
add_generation_prompt=True
)
}
ds = ds.map(preprocess)
# Tokenize inputs.
def tokenize(sample):
return tokenizer(
sample["text"],
padding=False,
max_length=MAX_SEQUENCE_LENGTH,
truncation=True,
add_special_tokens=False,
)
ds = ds.map(tokenize, remove_columns=ds.column_names)
The inference result is always strange when the output token length are longer than 400. The result is like this, normal output followed by abnormal output:
If i use auto-gptq, this problem will not exist, the result is always normal. Any idea on this ?
🛠️ Steps to reproduce
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggptqFor any PR / issue related to GPTQ supportFor any PR / issue related to GPTQ supportwNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support