You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
att, after we migrate pt2e quant code from pytorch to torchao, now we
also want to migrate the docs as well
Test Plan:
check generated docs
Reviewers:
Subscribers:
Tasks:
Tags:
@@ -121,6 +121,87 @@ On a single A100 GPU with 80GB memory, this prints::
121
121
int4 mean time: 4.410 ms
122
122
speedup: 6.9x
123
123
124
+
PyTorch 2 Export Quantization
125
+
=============================
126
+
PyTorch 2 Export Quantization is a full graph quantization workflow mostly for static quantization. It targets hardwares that requires both input and output activation and weight to be quantized and relies of recognizing an operator pattern to make quantization decisions (such as linear - relu). PT2E quantization produces a pattern with quantize and dequantize ops inserted around the operators and during lowering quantized operator patterns will be fused into real quantized ops. Currently there are two typical lowering paths, 1. torch.compile through inductor lowering 2. ExecuTorch through delegation
127
+
128
+
Here we show an example with X86InductorQuantizer
129
+
130
+
API Example::
131
+
132
+
import torch
133
+
from torchao.quantization.pt2e.quantize_pt2e import prepare_pt2e
134
+
from torch.export import export
135
+
from torchao.quantization.pt2e.quantizer.x86_inductor_quantizer import (
136
+
X86InductorQuantizer,
137
+
get_default_x86_inductor_quantization_config,
138
+
)
139
+
140
+
class M(torch.nn.Module):
141
+
def __init__(self):
142
+
super().__init__()
143
+
self.linear = torch.nn.Linear(5, 10)
144
+
145
+
def forward(self, x):
146
+
return self.linear(x)
147
+
148
+
# initialize a floating point model
149
+
float_model = M().eval()
150
+
151
+
# define calibration function
152
+
def calibrate(model, data_loader):
153
+
model.eval()
154
+
with torch.no_grad():
155
+
for image, target in data_loader:
156
+
model(image)
157
+
158
+
# Step 1. program capture
159
+
m = export(m, *example_inputs).module()
160
+
# we get a model with aten ops
161
+
162
+
# Step 2. quantization
163
+
# backend developer will write their own Quantizer and expose methods to allow
# or prepare_qat_pt2e for Quantization Aware Training
170
+
m = prepare_pt2e(m, quantizer)
171
+
172
+
# run calibration
173
+
# calibrate(m, sample_inference_data)
174
+
m = convert_pt2e(m)
175
+
176
+
# Step 3. lowering
177
+
# lower to target backend
178
+
179
+
# Optional: using the C++ wrapper instead of default Python wrapper
180
+
import torch._inductor.config as config
181
+
config.cpp_wrapper = True
182
+
183
+
with torch.no_grad():
184
+
optimized_model = torch.compile(converted_model)
185
+
186
+
# Running some benchmark
187
+
optimized_model(*example_inputs)
188
+
189
+
190
+
Please follow these tutorials to get started on PyTorch 2 Export Quantization:
191
+
192
+
Modeling Users:
193
+
194
+
- `PyTorch 2 Export Post Training Quantization <https://docs.pytorch.org/ao/stable/tutorial_source/pt2e_quant_ptq.html>`_
195
+
- `PyTorch 2 Export Quantization Aware Training <ttps://docs.pytorch.org/ao/stable/tutorial_source/pt2e_quant_qat.html>`_
196
+
- `PyTorch 2 Export Post Training Quantization with X86 Backend through Inductor <https://docs.pytorch.org/ao/stable/tutorial_source/pt2e_quant_x86_inductor.html>`_
197
+
- `PyTorch 2 Export Post Training Quantization with XPU Backend through Inductor <https://docs.pytorch.org/ao/stable/tutorial_source/pt2e_quant_xpu_inductor.html>`_
198
+
- `PyTorch 2 Export Quantization for OpenVINO torch.compile Backend <https://docs.pytorch.org/ao/stable/tutorial_source/pt2e_quant_openvino.html>`_
199
+
200
+
201
+
Backend Developers (please check out all Modeling Users docs as well):
202
+
203
+
- `How to Write a Quantizer for PyTorch 2 Export Quantization <https://docs.pytorch.org/ao/stable/tutorial_source/pt2e_quantizer.html>`_
0 commit comments