-
Notifications
You must be signed in to change notification settings - Fork 285
[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit implements AWQ (Activation-aware Weight Quantization) with QDQLayout support and 8-bit dynamic activation quantization for ExecuTorch compatibility, addressing GitHub issue #2388. Key features: - AWQObserverQDQ: Enhanced observer with 8-bit dynamic activation quantization - QDQLayout integration for ExecuTorch compatibility - AWQQDQConfig: Configuration for the new quantization approach - Complete API integration with quantize_() function - Comprehensive test suite with 8 test cases - Usage example demonstrating the full workflow The implementation extends the existing AWQ algorithm to support: 1. QDQLayout (Quantize-Dequantize Layout) for ExecuTorch export 2. 8-bit dynamic quantization of activation scales for better compression 3. Scale search algorithm maintaining AWQ's core optimization approach 4. Seamless integration with existing torchao quantization infrastructure Usage: ```python from torchao.prototype.awq import ( insert_awq_observer_qdq_, AWQQDQConfig, _is_awq_observed_linear_qdq ) from torchao.quantization import quantize_ insert_awq_observer_qdq_(model, ...) quantize_(model, AWQQDQConfig(...), filter_fn=_is_awq_observed_linear_qdq) ```
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2399
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit b137179 with merge base e4f2715 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@metascroy dont need to quite review yet, just wanted to put it up |
ah thanks @kimishpatel, I also started thinking about this one, was planning to have a general AWQConfig that has a base_config of Int8DynamicActivationInt4Weight or other base configs, but it's OK we start with this first as well |
layout = QDQLayout() | ||
tensor_dtype = torch.int8 | ||
|
||
quantized_weight = to_affine_quantized_intx( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that's slightly different is now we have to add dynamic quant here as well in observer so that scale is calculated in the context of the input will be quantized later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah tahts exactly, right? I had thought about this but forgot to instruct claude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, I think we can refactor awq to work with general ao configs and it might be easier for Claude to add the new functionality: #2400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my plan was to ask claude to do refactor to consolidate code in a subsequent commit. I can take a look at your PR too. Or if you want to comandeer this one, I am fine as well.
BTW I thought about what you said about dynamic act quant. Thing that worries me is that if we derive scale that is post dynamic act quantization then we cannot apply that scale before activation quantization. I mean we could but it wont be the same thing. So I am wondering whether we should do that or not. Might be just worth doing experiment also
Happy to make changes as seem apprpriate. This was mainly an exercise to leverage claude code more than an attempt to implement awq for dynamic act quant |
Still WIP
Good chunk of this was written by Claude
This commit implements AWQ (Activation-aware Weight Quantization) with QDQLayout support and 8-bit dynamic activation quantization for ExecuTorch compatibility, addressing GitHub issue #2388.
Key features:
The implementation extends the existing AWQ algorithm to support:
Usage: