23 Oct 01:22

wangzhaode

57c5695

llmexport v0.0.4 Latest

Latest

Release Notes v0.0.4

新增功能

1. EAGLE 支持

新增对 EAGLE 推理加速技术的支持
添加 --eagle_path 参数用于导出 EAGLE 模型
支持 Llama 和 Qwen3 系列模型的 EAGLE 导出

2. 模型支持扩展

新增对 SmolLM 系列模型的支持
新增对 bge-small 嵌入模型的支持
增强对 Qwen 系列多模态模型的支持

3. 多模态模型支持

改进对 Qwen3-VL 等多模态模型的支持
增强对视觉和音频模型的处理能力

修复问题

1. 嵌入加载修复

修复嵌入模型加载的问题
改进模型加载的稳定性和兼容性

2. 模型映射优化

优化模型类型检测和映射逻辑
增强对不同模型架构的兼容性

优化改进

1. ONNX 导出优化

新增 onnx_export 工具函数，统一 ONNX 导出流程
改进导出参数配置，支持动态轴设置

2. 量化优化

优化 AWQ、HQQ、Smooth 量化算法的实现
改进对称量化和非对称量化的处理

3. 性能优化

优化模型加载和导出性能
减少内存占用，提高导出效率

使用示例

# 导出支持 EAGLE 的模型
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --eagle_path path/to/eagle

# 导出 SmolLM 模型
llmexport --path SmolLM2-1.7B-Instruct --export onnx

兼容性说明

向后兼容 v0.0.3 版本
保持命令行接口一致性
新增参数不影响现有功能使用

Assets 2

03 Sep 07:03

wangzhaode

v0.0.3

1202b93

llmexport v0.0.3

Release Notes - v0.0.3

🎉 Major Updates

This release represents a significant milestone with comprehensive architecture improvements and extensive new model support. The codebase has been completely restructured and synchronized with the latest MNN framework.

🚀 New Features

Model Support

✅ SmolLM Series: Added support for SmolLM models with optimized configurations
✅ MobileLLM Series: Enhanced support for mobile-optimized language models
✅ BGE Models: Added support for bge-small embedding models
✅ OpenELM: Support for Apple's OpenELM model series

Quantization Enhancements

🔥 AWQ Quantization: Full implementation of AWQ (Activation-aware Weight Quantization)
🔥 Symmetric Quantization: Added symmetric quantization support for improved performance
🔥 Mixed Quantization: New mixed quantization strategies for optimal model compression
🔥 HQQ Quantization: Half-Quadratic Quantization support added

Architecture Improvements

📁 Modular Utils: Complete reorganization with dedicated utility modules:
- Audio processing utilities (audio.py)
- Vision model handling (vision.py)
- GGUF file support (gguf/)
- Advanced quantization modules
- MNN conversion utilities
- ONNX optimization tools

Enhanced Capabilities

🎵 Audio Models: Added support for audio-enabled models (Qwen2-Audio, etc.)
👁️ Vision Models: Enhanced vision model support with specialized processing
🔧 LoRA Integration: Improved LoRA weight handling and merging
🎯 Model Mapping: Advanced model architecture mapping system

🐛 Bug Fixes

Embedding Loading: Fixed critical embedding loading issues
ONNX Dynamic Axis: Resolved dynamic axis configuration problems
Linear Layer Bias: Fixed duplicate naming issues in ONNX export for Linear and bias operations
Model Compatibility: Enhanced compatibility across different model architectures

📚 Documentation Updates

README Optimization: Completely restructured README with professional badges, clear installation guides, and comprehensive feature documentation
Model Downloads: Added extensive model download links for both ModelScope and Hugging Face
Popular Models: Updated with latest high-demand models including:
- DeepSeek-R1-1.5B-Qwen
- Qwen2.5 series (0.5B, 1.5B)
- GPT-OSS-20B
- Qwen3-4B-Instruct-2507

🔧 Technical Improvements

Code Restructuring: Major refactoring with 10,297 lines added and modular architecture
Performance Optimization: Enhanced inference speed and memory efficiency
Cross-platform Support: Improved compatibility across different deployment platforms
Error Handling: Better error reporting and debugging capabilities

📦 Installation & Usage

# Install latest version
pip install llmexport==0.0.3

# Quick export example
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --quant_bit 4

🔗 Related Projects

MNN Inference: mnn-llm
ONNX Inference: onnx-llm
Model Optimization: OnnxSlim

⚠️ Breaking Changes

This version includes significant architectural changes. Please review the updated documentation and examples when upgrading from previous versions.

🙏 Acknowledgments

Special thanks to all contributors and the MNN team for their continuous support and collaboration in making this release possible.

Full Changelog: v0.0.2...v0.0.3

Assets 2

27 Sep 06:28

wangzhaode

v0.0.2

d6b53aa

llmexport v0.0.2

Features

Added support for Qwen2-VL.
Introduced support for GTE and split embedding layers for BGE/GTE.
Implemented imitate_quant functionality during testing.
Enabled usage of C++ compiled MNNConvert.

Refactors

Refactored the implementation of the VL model.
Updated model path handling for ONNX models.

Bug Fixes

Resolved issues with stop_ids and quantization.
Fixed the bug related to block_size = 0.

Assets 2

19 Aug 09:31

wangzhaode

v0.0.1

fc1ebe1

llmexport v0.0.1

Support export onnx/ mnn from pretrain model.
Using FakeLinear to save memory and time when export onnx and mnn.
Support onnxslim to optimize onnx graph.

Assets 2

Releases: wangzhaode/llm-export

llmexport v0.0.4

Release Notes v0.0.4

新增功能

1. EAGLE 支持

2. 模型支持扩展

3. 多模态模型支持

修复问题

1. 嵌入加载修复

2. 模型映射优化

优化改进

1. ONNX 导出优化

2. 量化优化

3. 性能优化

使用示例

兼容性说明

Uh oh!

llmexport v0.0.3

Release Notes - v0.0.3

🎉 Major Updates

🚀 New Features

Model Support

Quantization Enhancements

Architecture Improvements

Enhanced Capabilities

🐛 Bug Fixes

📚 Documentation Updates

🔧 Technical Improvements

📦 Installation & Usage

🔗 Related Projects

⚠️ Breaking Changes

🙏 Acknowledgments

Uh oh!

llmexport v0.0.2

Features

Refactors

Bug Fixes

Uh oh!

llmexport v0.0.1

Uh oh!