Skip to content

[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Jun 18, 2025

Still WIP

Good chunk of this was written by Claude

This commit implements AWQ (Activation-aware Weight Quantization) with QDQLayout support and 8-bit dynamic activation quantization for ExecuTorch compatibility, addressing GitHub issue #2388.

Key features:

  • AWQObserverQDQ: Enhanced observer with 8-bit dynamic activation quantization
  • QDQLayout integration for ExecuTorch compatibility
  • AWQQDQConfig: Configuration for the new quantization approach
  • Complete API integration with quantize_() function
  • Comprehensive test suite with 8 test cases
  • Usage example demonstrating the full workflow

The implementation extends the existing AWQ algorithm to support:

  1. QDQLayout (Quantize-Dequantize Layout) for ExecuTorch export
  2. 8-bit dynamic quantization of activation scales for better compression
  3. Scale search algorithm maintaining AWQ's core optimization approach
  4. Seamless integration with existing torchao quantization infrastructure

Usage:

from torchao.prototype.awq import (
    insert_awq_observer_qdq_, AWQQDQConfig, _is_awq_observed_linear_qdq
)
from torchao.quantization import quantize_

insert_awq_observer_qdq_(model, ...)
quantize_(model, AWQQDQConfig(...), filter_fn=_is_awq_observed_linear_qdq)

This commit implements AWQ (Activation-aware Weight Quantization) with
QDQLayout support and 8-bit dynamic activation quantization for ExecuTorch
compatibility, addressing GitHub issue #2388.

Key features:
- AWQObserverQDQ: Enhanced observer with 8-bit dynamic activation quantization
- QDQLayout integration for ExecuTorch compatibility
- AWQQDQConfig: Configuration for the new quantization approach
- Complete API integration with quantize_() function
- Comprehensive test suite with 8 test cases
- Usage example demonstrating the full workflow

The implementation extends the existing AWQ algorithm to support:
1. QDQLayout (Quantize-Dequantize Layout) for ExecuTorch export
2. 8-bit dynamic quantization of activation scales for better compression
3. Scale search algorithm maintaining AWQ's core optimization approach
4. Seamless integration with existing torchao quantization infrastructure

Usage:
```python
from torchao.prototype.awq import (
    insert_awq_observer_qdq_, AWQQDQConfig, _is_awq_observed_linear_qdq
)
from torchao.quantization import quantize_

insert_awq_observer_qdq_(model, ...)
quantize_(model, AWQQDQConfig(...), filter_fn=_is_awq_observed_linear_qdq)
```
Copy link

pytorch-bot bot commented Jun 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2399

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit b137179 with merge base e4f2715 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2025
@kimishpatel kimishpatel requested a review from metascroy June 18, 2025 03:10
@kimishpatel
Copy link
Contributor Author

@metascroy dont need to quite review yet, just wanted to put it up

@jerryzh168
Copy link
Contributor

ah thanks @kimishpatel, I also started thinking about this one, was planning to have a general AWQConfig that has a base_config of Int8DynamicActivationInt4Weight or other base configs, but it's OK we start with this first as well

layout = QDQLayout()
tensor_dtype = torch.int8

quantized_weight = to_affine_quantized_intx(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that's slightly different is now we have to add dynamic quant here as well in observer so that scale is calculated in the context of the input will be quantized later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah tahts exactly, right? I had thought about this but forgot to instruct claude

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, I think we can refactor awq to work with general ao configs and it might be easier for Claude to add the new functionality: #2400

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my plan was to ask claude to do refactor to consolidate code in a subsequent commit. I can take a look at your PR too. Or if you want to comandeer this one, I am fine as well.

BTW I thought about what you said about dynamic act quant. Thing that worries me is that if we derive scale that is post dynamic act quantization then we cannot apply that scale before activation quantization. I mean we could but it wont be the same thing. So I am wondering whether we should do that or not. Might be just worth doing experiment also

@kimishpatel
Copy link
Contributor Author

Int8DynamicActivationInt4Weight

Happy to make changes as seem apprpriate. This was mainly an exercise to leverage claude code more than an attempt to implement awq for dynamic act quant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants