[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399

kimishpatel · 2025-06-18T03:09:54Z

Still WIP

Good chunk of this was written by Claude

This commit implements AWQ (Activation-aware Weight Quantization) with QDQLayout support and 8-bit dynamic activation quantization for ExecuTorch compatibility, addressing GitHub issue #2388.

Key features:

AWQObserverQDQ: Enhanced observer with 8-bit dynamic activation quantization
QDQLayout integration for ExecuTorch compatibility
AWQQDQConfig: Configuration for the new quantization approach
Complete API integration with quantize_() function
Comprehensive test suite with 8 test cases
Usage example demonstrating the full workflow

The implementation extends the existing AWQ algorithm to support:

QDQLayout (Quantize-Dequantize Layout) for ExecuTorch export
8-bit dynamic quantization of activation scales for better compression
Scale search algorithm maintaining AWQ's core optimization approach
Seamless integration with existing torchao quantization infrastructure

Usage:

from torchao.prototype.awq import (
    insert_awq_observer_qdq_, AWQQDQConfig, _is_awq_observed_linear_qdq
)
from torchao.quantization import quantize_

insert_awq_observer_qdq_(model, ...)
quantize_(model, AWQQDQConfig(...), filter_fn=_is_awq_observed_linear_qdq)

This commit implements AWQ (Activation-aware Weight Quantization) with QDQLayout support and 8-bit dynamic activation quantization for ExecuTorch compatibility, addressing GitHub issue #2388. Key features: - AWQObserverQDQ: Enhanced observer with 8-bit dynamic activation quantization - QDQLayout integration for ExecuTorch compatibility - AWQQDQConfig: Configuration for the new quantization approach - Complete API integration with quantize_() function - Comprehensive test suite with 8 test cases - Usage example demonstrating the full workflow The implementation extends the existing AWQ algorithm to support: 1. QDQLayout (Quantize-Dequantize Layout) for ExecuTorch export 2. 8-bit dynamic quantization of activation scales for better compression 3. Scale search algorithm maintaining AWQ's core optimization approach 4. Seamless integration with existing torchao quantization infrastructure Usage: ```python from torchao.prototype.awq import ( insert_awq_observer_qdq_, AWQQDQConfig, _is_awq_observed_linear_qdq ) from torchao.quantization import quantize_ insert_awq_observer_qdq_(model, ...) quantize_(model, AWQQDQConfig(...), filter_fn=_is_awq_observed_linear_qdq) ```

pytorch-bot · 2025-06-18T03:09:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2399

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit b137179 with merge base e4f2715 ():

NEW FAILURES - The following jobs have failed:

Code Analysis with Ruff / build (3.9) (gh)
torchao/prototype/awq/executorch_awq.py:49:5: F401 [*] torchao.quantization.linear_activation_quantized_tensor.LinearActivationQuantizedTensor imported but unused
PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.
Run Regression Tests / test (CPU 2.5.1, linux.4xlarge, torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu, cp... / linux-job (gh)
RuntimeError: Command docker exec -t 6892d8f0f1a556f69089f139b2d4a504e35bb1f4932d9687ca2f5d56d2817def /exec failed with exit code 2
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/prototype/test_awq_executorch.py::TestAWQExecutorchIntegration::test_graph_pattern_for_executorch
Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job (gh)
RuntimeError: Command docker exec -t 42fab4ffa74fb1001a758cb8d9450faeaee9d7448b7db13bc32c586398933544 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
test/prototype/test_awq_executorch.py::TestAWQExecutorchIntegration::test_graph_pattern_for_executorch

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kimishpatel · 2025-06-18T03:10:53Z

@metascroy dont need to quite review yet, just wanted to put it up

jerryzh168 · 2025-06-18T03:25:04Z

ah thanks @kimishpatel, I also started thinking about this one, was planning to have a general AWQConfig that has a base_config of Int8DynamicActivationInt4Weight or other base configs, but it's OK we start with this first as well

jerryzh168 · 2025-06-18T03:26:42Z

torchao/prototype/awq/executorch_awq.py

+            layout = QDQLayout()
+            tensor_dtype = torch.int8
+
+            quantized_weight = to_affine_quantized_intx(


one thing that's slightly different is now we have to add dynamic quant here as well in observer so that scale is calculated in the context of the input will be quantized later.

yeah tahts exactly, right? I had thought about this but forgot to instruct claude

oh I see, I think we can refactor awq to work with general ao configs and it might be easier for Claude to add the new functionality: #2400

my plan was to ask claude to do refactor to consolidate code in a subsequent commit. I can take a look at your PR too. Or if you want to comandeer this one, I am fine as well.

BTW I thought about what you said about dynamic act quant. Thing that worries me is that if we derive scale that is post dynamic act quantization then we cannot apply that scale before activation quantization. I mean we could but it wont be the same thing. So I am wondering whether we should do that or not. Might be just worth doing experiment also

kimishpatel · 2025-06-18T15:57:38Z

Int8DynamicActivationInt4Weight

Happy to make changes as seem apprpriate. This was mainly an exercise to leverage claude code more than an attempt to implement awq for dynamic act quant

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2025

kimishpatel requested a review from metascroy June 18, 2025 03:10

jerryzh168 reviewed Jun 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399

[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399

kimishpatel commented Jun 18, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

kimishpatel commented Jun 18, 2025

Uh oh!

jerryzh168 commented Jun 18, 2025

Uh oh!

jerryzh168 Jun 18, 2025

Uh oh!

kimishpatel Jun 18, 2025

Uh oh!

jerryzh168 Jun 18, 2025

Uh oh!

kimishpatel Jun 18, 2025

Uh oh!

kimishpatel commented Jun 18, 2025

Uh oh!

Uh oh!

[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399

Are you sure you want to change the base?

[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399

Conversation

kimishpatel commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2399

❌ 6 New Failures

Uh oh!

kimishpatel commented Jun 18, 2025

Uh oh!

jerryzh168 commented Jun 18, 2025

Uh oh!

jerryzh168 Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel commented Jun 18, 2025

Uh oh!

Uh oh!

kimishpatel commented Jun 18, 2025 •

edited

Loading

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading