Skip to content

Why the operators in static-quantized model is sitll fp32 ? #24038

You must be logged in to vote

Another question is, does the offline optimization is necessary to static quantization ? If I don't apply offline optimization just leave the qdq nodes in onnx model , there is no peformance acceleration and the performance will degrae because of execution of the de-quant node?

By default ONNX Runtime will apply graph optimizations when it loads a model, so it will work the same as if you saved the optimized model offline, but may take longer to load. Only if you turn off graph optimizations will it run un-optimized.

Replies: 1 comment 3 replies

You must be logged in to vote
3 replies
@stricklandye

@robertknight

Answer selected by stricklandye
@stricklandye

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants