Open
Description
Summary
- float8 training w/ rowwise scales uses power of 2 scales by default, to reduce quantization error
- float8 inference w/
Float8DynamicActivationFloat8WeightConfig
usingPerRow
scaling doesn't support power of 2 scales - users have reported they want to be able to use power of 2 scales for inference after training with them.