You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ai_edge_torch/generative/quantize/README.md
+25-3Lines changed: 25 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,29 @@ Once converted, you will get a quantized `.tflite` model which will be ready for
18
18
19
19
In the current release, the following schemes are supported:
20
20
21
-
* Dynamic range quantization with FP32 activations and INT8 weights for linear ops
22
-
* FP16 quantization with FP16 weights and FP32 activations and computation for all ops
21
+
* Dynamic range quantization: FP32 activations, INT8 weights, and integer computation
22
+
* Weight-only quantization: FP32 activations, INT8 weights, and floating point computation
23
+
* FP16 quantization: FP16 weights, FP32 activations and floating point computation for all ops
24
+
25
+
These correspond to the available recipes in `quant_recipes.py`.
26
+
27
+
## Advanced usage
28
+
29
+
In addition to configuring quantization using pre-configured recipes in `quant_recipes.py`, users can also customize their recipes according to their specific needs using the `LayerQuantRecipe` and `GenerativeQuantRecipe` API.
30
+
31
+
`LayerQuantRecipe` specifies at a Generative API layer (`ai_edge_torch/generative/layers`) level how ops within should be quantized. `GenerativeQuantRecipe` specifies at a model level how each component of a Generative API model should be quantized. With these configuration classes, selective quantization can be configured as follows:
For example, this recipe specifies that the embedding table, attention, and feedforward layers should be quantized to INT8. Specifically, for attention layers the computation should be in FP32. All other ops should be quantized to the default scheme which is specified as FP16.
23
46
24
-
These correspond to the available recipes in `quant_recipes.py`
0 commit comments