Skip to content

Releases: wangzhaode/llm-export

llmexport v0.0.4

23 Oct 01:22

Choose a tag to compare

Release Notes v0.0.4

新增功能

1. EAGLE 支持

  • 新增对 EAGLE 推理加速技术的支持
  • 添加 --eagle_path 参数用于导出 EAGLE 模型
  • 支持 Llama 和 Qwen3 系列模型的 EAGLE 导出

2. 模型支持扩展

  • 新增对 SmolLM 系列模型的支持
  • 新增对 bge-small 嵌入模型的支持
  • 增强对 Qwen 系列多模态模型的支持

3. 多模态模型支持

  • 改进对 Qwen3-VL 等多模态模型的支持
  • 增强对视觉和音频模型的处理能力

修复问题

1. 嵌入加载修复

  • 修复嵌入模型加载的问题
  • 改进模型加载的稳定性和兼容性

2. 模型映射优化

  • 优化模型类型检测和映射逻辑
  • 增强对不同模型架构的兼容性

优化改进

1. ONNX 导出优化

  • 新增 onnx_export 工具函数,统一 ONNX 导出流程
  • 改进导出参数配置,支持动态轴设置

2. 量化优化

  • 优化 AWQ、HQQ、Smooth 量化算法的实现
  • 改进对称量化和非对称量化的处理

3. 性能优化

  • 优化模型加载和导出性能
  • 减少内存占用,提高导出效率

使用示例

# 导出支持 EAGLE 的模型
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --eagle_path path/to/eagle

# 导出 SmolLM 模型
llmexport --path SmolLM2-1.7B-Instruct --export onnx

兼容性说明

  • 向后兼容 v0.0.3 版本
  • 保持命令行接口一致性
  • 新增参数不影响现有功能使用

llmexport v0.0.3

03 Sep 07:03

Choose a tag to compare

Release Notes - v0.0.3

🎉 Major Updates

This release represents a significant milestone with comprehensive architecture improvements and extensive new model support. The codebase has been completely restructured and synchronized with the latest MNN framework.

🚀 New Features

Model Support

  • SmolLM Series: Added support for SmolLM models with optimized configurations
  • MobileLLM Series: Enhanced support for mobile-optimized language models
  • BGE Models: Added support for bge-small embedding models
  • OpenELM: Support for Apple's OpenELM model series

Quantization Enhancements

  • 🔥 AWQ Quantization: Full implementation of AWQ (Activation-aware Weight Quantization)
  • 🔥 Symmetric Quantization: Added symmetric quantization support for improved performance
  • 🔥 Mixed Quantization: New mixed quantization strategies for optimal model compression
  • 🔥 HQQ Quantization: Half-Quadratic Quantization support added

Architecture Improvements

  • 📁 Modular Utils: Complete reorganization with dedicated utility modules:
    • Audio processing utilities (audio.py)
    • Vision model handling (vision.py)
    • GGUF file support (gguf/)
    • Advanced quantization modules
    • MNN conversion utilities
    • ONNX optimization tools

Enhanced Capabilities

  • 🎵 Audio Models: Added support for audio-enabled models (Qwen2-Audio, etc.)
  • 👁️ Vision Models: Enhanced vision model support with specialized processing
  • 🔧 LoRA Integration: Improved LoRA weight handling and merging
  • 🎯 Model Mapping: Advanced model architecture mapping system

🐛 Bug Fixes

  • Embedding Loading: Fixed critical embedding loading issues
  • ONNX Dynamic Axis: Resolved dynamic axis configuration problems
  • Linear Layer Bias: Fixed duplicate naming issues in ONNX export for Linear and bias operations
  • Model Compatibility: Enhanced compatibility across different model architectures

📚 Documentation Updates

  • README Optimization: Completely restructured README with professional badges, clear installation guides, and comprehensive feature documentation
  • Model Downloads: Added extensive model download links for both ModelScope and Hugging Face
  • Popular Models: Updated with latest high-demand models including:
    • DeepSeek-R1-1.5B-Qwen
    • Qwen2.5 series (0.5B, 1.5B)
    • GPT-OSS-20B
    • Qwen3-4B-Instruct-2507

🔧 Technical Improvements

  • Code Restructuring: Major refactoring with 10,297 lines added and modular architecture
  • Performance Optimization: Enhanced inference speed and memory efficiency
  • Cross-platform Support: Improved compatibility across different deployment platforms
  • Error Handling: Better error reporting and debugging capabilities

📦 Installation & Usage

# Install latest version
pip install llmexport==0.0.3

# Quick export example
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --quant_bit 4

🔗 Related Projects

⚠️ Breaking Changes

This version includes significant architectural changes. Please review the updated documentation and examples when upgrading from previous versions.

🙏 Acknowledgments

Special thanks to all contributors and the MNN team for their continuous support and collaboration in making this release possible.


Full Changelog: v0.0.2...v0.0.3

llmexport v0.0.2

27 Sep 06:28

Choose a tag to compare

Features

  • Added support for Qwen2-VL.
  • Introduced support for GTE and split embedding layers for BGE/GTE.
  • Implemented imitate_quant functionality during testing.
  • Enabled usage of C++ compiled MNNConvert.

Refactors

  • Refactored the implementation of the VL model.
  • Updated model path handling for ONNX models.

Bug Fixes

  • Resolved issues with stop_ids and quantization.
  • Fixed the bug related to block_size = 0.

llmexport v0.0.1

19 Aug 09:31

Choose a tag to compare

  • Support export onnx/ mnn from pretrain model.
  • Using FakeLinear to save memory and time when export onnx and mnn.
  • Support onnxslim to optimize onnx graph.