-
Notifications
You must be signed in to change notification settings - Fork 273
[OpenVINO] Optimized weight compression for FP4 mode #3737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenVINO] Optimized weight compression for FP4 mode #3737
Conversation
| :param precomputed_scale: Optional precomputed scale. | ||
| :return: Returns quantized (for MXFP8_E4M3, FP4 and FP8_E4M3 normalized) | ||
| weight tensor and corresponding scale tensor. | ||
| :return: Returns quantized weight tensor and corresponding scale tensor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MXFP8_E4M3 and FP8_E4M3 are actually not supported by optimized compression, no need to mention this.
| ) | ||
|
|
||
| MXFP4_QUANTILES = np.array( | ||
| F4E2M1_QUANTILES = np.array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mxfp4 is a compression format with f4e2m1 weight, f8e8m0 scale and group size 32. While this grid is defined according to f4e2m1 data type, irrespective of other parameters. So renamed.
daniil-lyakhov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution! A couple minor comments
Changes
Added optimized weight compression through OpenVINO models for FP4 compression mode. Results should be similar to MXFP4 (#3550).
Reason for changes
Improving UX.
Tests
Extended
tests/openvino/optimized_functions/test_compression_functions.py