Releases: wangzhaode/llm-export
Releases · wangzhaode/llm-export
llmexport v0.0.4
Release Notes v0.0.4
新增功能
1. EAGLE 支持
- 新增对 EAGLE 推理加速技术的支持
- 添加
--eagle_path参数用于导出 EAGLE 模型 - 支持 Llama 和 Qwen3 系列模型的 EAGLE 导出
2. 模型支持扩展
- 新增对 SmolLM 系列模型的支持
- 新增对 bge-small 嵌入模型的支持
- 增强对 Qwen 系列多模态模型的支持
3. 多模态模型支持
- 改进对 Qwen3-VL 等多模态模型的支持
- 增强对视觉和音频模型的处理能力
修复问题
1. 嵌入加载修复
- 修复嵌入模型加载的问题
- 改进模型加载的稳定性和兼容性
2. 模型映射优化
- 优化模型类型检测和映射逻辑
- 增强对不同模型架构的兼容性
优化改进
1. ONNX 导出优化
- 新增
onnx_export工具函数,统一 ONNX 导出流程 - 改进导出参数配置,支持动态轴设置
2. 量化优化
- 优化 AWQ、HQQ、Smooth 量化算法的实现
- 改进对称量化和非对称量化的处理
3. 性能优化
- 优化模型加载和导出性能
- 减少内存占用,提高导出效率
使用示例
# 导出支持 EAGLE 的模型
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --eagle_path path/to/eagle
# 导出 SmolLM 模型
llmexport --path SmolLM2-1.7B-Instruct --export onnx兼容性说明
- 向后兼容 v0.0.3 版本
- 保持命令行接口一致性
- 新增参数不影响现有功能使用
llmexport v0.0.3
Release Notes - v0.0.3
🎉 Major Updates
This release represents a significant milestone with comprehensive architecture improvements and extensive new model support. The codebase has been completely restructured and synchronized with the latest MNN framework.
🚀 New Features
Model Support
- ✅ SmolLM Series: Added support for SmolLM models with optimized configurations
- ✅ MobileLLM Series: Enhanced support for mobile-optimized language models
- ✅ BGE Models: Added support for bge-small embedding models
- ✅ OpenELM: Support for Apple's OpenELM model series
Quantization Enhancements
- 🔥 AWQ Quantization: Full implementation of AWQ (Activation-aware Weight Quantization)
- 🔥 Symmetric Quantization: Added symmetric quantization support for improved performance
- 🔥 Mixed Quantization: New mixed quantization strategies for optimal model compression
- 🔥 HQQ Quantization: Half-Quadratic Quantization support added
Architecture Improvements
- 📁 Modular Utils: Complete reorganization with dedicated utility modules:
- Audio processing utilities (audio.py)
- Vision model handling (vision.py)
- GGUF file support (
gguf/) - Advanced quantization modules
- MNN conversion utilities
- ONNX optimization tools
Enhanced Capabilities
- 🎵 Audio Models: Added support for audio-enabled models (Qwen2-Audio, etc.)
- 👁️ Vision Models: Enhanced vision model support with specialized processing
- 🔧 LoRA Integration: Improved LoRA weight handling and merging
- 🎯 Model Mapping: Advanced model architecture mapping system
🐛 Bug Fixes
- Embedding Loading: Fixed critical embedding loading issues
- ONNX Dynamic Axis: Resolved dynamic axis configuration problems
- Linear Layer Bias: Fixed duplicate naming issues in ONNX export for Linear and bias operations
- Model Compatibility: Enhanced compatibility across different model architectures
📚 Documentation Updates
- README Optimization: Completely restructured README with professional badges, clear installation guides, and comprehensive feature documentation
- Model Downloads: Added extensive model download links for both ModelScope and Hugging Face
- Popular Models: Updated with latest high-demand models including:
- DeepSeek-R1-1.5B-Qwen
- Qwen2.5 series (0.5B, 1.5B)
- GPT-OSS-20B
- Qwen3-4B-Instruct-2507
🔧 Technical Improvements
- Code Restructuring: Major refactoring with 10,297 lines added and modular architecture
- Performance Optimization: Enhanced inference speed and memory efficiency
- Cross-platform Support: Improved compatibility across different deployment platforms
- Error Handling: Better error reporting and debugging capabilities
📦 Installation & Usage
# Install latest version
pip install llmexport==0.0.3
# Quick export example
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --quant_bit 4🔗 Related Projects
⚠️ Breaking Changes
This version includes significant architectural changes. Please review the updated documentation and examples when upgrading from previous versions.
🙏 Acknowledgments
Special thanks to all contributors and the MNN team for their continuous support and collaboration in making this release possible.
Full Changelog: v0.0.2...v0.0.3
llmexport v0.0.2
Features
- Added support for Qwen2-VL.
- Introduced support for GTE and split embedding layers for BGE/GTE.
- Implemented
imitate_quantfunctionality during testing. - Enabled usage of C++ compiled MNNConvert.
Refactors
- Refactored the implementation of the VL model.
- Updated model path handling for ONNX models.
Bug Fixes
- Resolved issues with
stop_idsand quantization. - Fixed the bug related to
block_size = 0.
llmexport v0.0.1
- Support export onnx/ mnn from pretrain model.
- Using FakeLinear to save memory and time when export onnx and mnn.
- Support
onnxslimto optimize onnx graph.