feat:feat: Added logging, parallel processing, and CPU processing option for FP8 to BF16 conversion #461

anand-144 · 2025-01-29T17:15:00Z

Implemented logging to track the conversion process and handle missing tensors more effectively.
Introduced parallel processing using ThreadPoolExecutor to speed up large model conversions.
Added an option to use CPU (--use-cpu) instead of GPU for environments without CUDA support.
Optimized memory usage by caching only the two most recently used files.
Updated progress tracking with tqdm for better visibility.
Ensured the model.safetensors.index.json file correctly removes _scale_inv references.
These enhancements improve efficiency, usability, and robustness of the FP8 to BF16 conversion process.

…tion for FP8 to BF16 conversion

�[200~feat: Added logging, parallel processing, and CPU processing op…

e965eec

…tion for FP8 to BF16 conversion

anand-144 closed this Jan 29, 2025

anand-144 reopened this Jan 29, 2025

Provide feedback