feat: added support for Mistral models in Pytorch workflow and HF quantization script #3843

hypdeb · 2025-04-24T15:48:31Z

No description provided.

…ion script Signed-off-by: jdebache <[email protected]>

hypdeb · 2025-04-24T15:49:26Z

Hello @litaotju do you know who is the best person to review modelling changes in the Pytorch workflow?

QiJune · 2025-04-24T16:10:46Z

examples/quantization/README.md

+
+## Quantizing from the HuggingFace format to the HuggingFace format
+It is useful to be able to quantize models from HuggingFace, without changing their format, for example if you plan to use them in TensorRT-LLM's Pytorch-based workflow. The `quantize_hf_to_hf.py` script serves that purpose. It is a reduced version of ModelOpt's [example post-training quantization script](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/examples/llm_ptq/hf_ptq.py). Please refer to the original for a more up-to-date version. For example, it can be used to quantize a model to `fp8` for tensor-parallelism 4:
+```


Hi @hypdeb , I don't think it's a good idea to copy scripts from ModelOpt to TRT-LLM. Can we add some links to ModelOpt instead? like this one: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/torch.md#quantization

I agree that in general copying sources is not a good idea. I did it here as this is an example usage script, so it is basically made to be copied and adjusted by users, or at least that's how I see it. What do you think?

Could you please create a PR for Mistral models only? We can merge it first. For the HF quantization script, I think it's still under debate about the user workflow.

Here is the MR with only the added Mistral modelling code: #3845

For the HF quantization script, I think it's still under debate about the user workflow.

Could you maybe CC me on these discussions / ping me on Slack on this? I am interested as I would like to add a more lightweight and TRTLLM independent flow for quantization to my automation on the long term.

added support for Mistral models in Pytorch workflow and HF quantizat…

d097381

…ion script Signed-off-by: jdebache <[email protected]>

hypdeb requested a review from FrankD412 April 24, 2025 15:48

hypdeb self-assigned this Apr 24, 2025

hypdeb mentioned this pull request Apr 24, 2025

feat: output-format arg for synthetic data generation script and improved support for Mistral models in the Pytorch workflow #3419

Closed

QiJune reviewed Apr 24, 2025

View reviewed changes

hypdeb closed this Apr 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: added support for Mistral models in Pytorch workflow and HF quantization script #3843

feat: added support for Mistral models in Pytorch workflow and HF quantization script #3843

Uh oh!

hypdeb commented Apr 24, 2025

Uh oh!

hypdeb commented Apr 24, 2025

Uh oh!

QiJune Apr 24, 2025

Uh oh!

hypdeb Apr 24, 2025

Uh oh!

QiJune Apr 24, 2025

Uh oh!

hypdeb Apr 24, 2025

Uh oh!

hypdeb Apr 24, 2025

Uh oh!

Uh oh!

feat: added support for Mistral models in Pytorch workflow and HF quantization script #3843

feat: added support for Mistral models in Pytorch workflow and HF quantization script #3843

Uh oh!

Conversation

hypdeb commented Apr 24, 2025

Uh oh!

hypdeb commented Apr 24, 2025

Uh oh!

QiJune Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

hypdeb Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

QiJune Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

hypdeb Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

hypdeb Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!