Skip to content

Commit f1f37d7

Browse files
Merge pull request #15 from codewithdark-git/feature/add_GGUF
Feature/add gguf
2 parents 8de0c54 + a94810a commit f1f37d7

File tree

18 files changed

+735
-1227
lines changed

18 files changed

+735
-1227
lines changed

.github/workflows/ci.yml

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: QuantLLM CI/CD
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
tags:
7+
- 'v*'
8+
pull_request:
9+
branches: [ main ]
10+
11+
jobs:
12+
test:
13+
runs-on: ubuntu-latest
14+
strategy:
15+
matrix:
16+
python-version: ["3.10", "3.11"]
17+
18+
steps:
19+
- uses: actions/checkout@v3
20+
21+
- name: Set up Python ${{ matrix.python-version }}
22+
uses: actions/setup-python@v4
23+
with:
24+
python-version: ${{ matrix.python-version }}
25+
26+
- name: Install dependencies
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install -e .[dev,test,gguf]
30+
pip install pytest pytest-cov black isort
31+
32+
- name: Check code formatting
33+
run: |
34+
black . --check
35+
isort . --check-only
36+
37+
- name: Run tests
38+
run: |
39+
pytest tests/ --cov=quantllm --cov-report=xml
40+
41+
- name: Upload coverage to Codecov
42+
uses: codecov/codecov-action@v3
43+
with:
44+
file: ./coverage.xml
45+
fail_ci_if_error: true
46+
47+
publish:
48+
needs: test
49+
runs-on: ubuntu-latest
50+
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v')
51+
52+
steps:
53+
- uses: actions/checkout@v3
54+
55+
- name: Set up Python
56+
uses: actions/setup-python@v4
57+
with:
58+
python-version: "3.10"
59+
60+
- name: Install dependencies
61+
run: |
62+
python -m pip install --upgrade pip
63+
pip install build twine
64+
65+
- name: Build package
66+
run: python -m build
67+
68+
- name: Publish to PyPI
69+
env:
70+
TWINE_USERNAME: __token__
71+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
72+
run: |
73+
twine check dist/*
74+
twine upload dist/*

.github/workflows/docs.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Documentation
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
paths:
7+
- 'docs/**'
8+
- '.github/workflows/docs.yml'
9+
pull_request:
10+
branches: [ main ]
11+
paths:
12+
- 'docs/**'
13+
14+
jobs:
15+
docs:
16+
runs-on: ubuntu-latest
17+
18+
steps:
19+
- uses: actions/checkout@v3
20+
21+
- name: Set up Python
22+
uses: actions/setup-python@v4
23+
with:
24+
python-version: "3.10"
25+
26+
- name: Install dependencies
27+
run: |
28+
python -m pip install --upgrade pip
29+
pip install -e .[docs]
30+
pip install sphinx sphinx-rtd-theme
31+
32+
- name: Build documentation
33+
run: |
34+
cd docs
35+
make html
36+
37+
- name: Deploy to GitHub Pages
38+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
39+
uses: peaceiris/actions-gh-pages@v3
40+
with:
41+
github_token: ${{ secrets.GITHUB_TOKEN }}
42+
publish_dir: ./docs/_build/html

README.md

Lines changed: 76 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,72 @@
1-
# 🧠 QuantLLM: Lightweight Library for Quantized LLM Fine-Tuning and Deployment
1+
# 🧠 QuantLLM: Efficient GGUF Model Quantization and Deployment
22

33
[![PyPI Downloads](https://static.pepy.tech/badge/quantllm)](https://pepy.tech/projects/quantllm)
44
<img alt="PyPI - Version" src="https://img.shields.io/pypi/v/quantllm?logo=pypi&label=version&">
55

6-
76
## 📌 Overview
87

9-
**QuantLLM** is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) **efficiently** using **4-bit and 8-bit quantization** techniques. It provides a modular and flexible framework for:
10-
11-
- **Loading and quantizing models** with advanced configurations
12-
- **LoRA / QLoRA-based fine-tuning** with customizable parameters
13-
- **Dataset management** with preprocessing and splitting
14-
- **Training and evaluation** with comprehensive metrics
15-
- **Model checkpointing** and versioning
16-
- **Hugging Face Hub integration** for model sharing
8+
**QuantLLM** is a Python library designed for efficient model quantization using the GGUF (GGML Universal Format) method. It provides a robust framework for converting and deploying large language models with minimal memory footprint and optimal performance. Key capabilities include:
179

18-
The goal of QuantLLM is to **democratize LLM training**, especially in low-resource environments, while keeping the workflow intuitive, modular, and production-ready.
10+
- **Memory-efficient GGUF quantization** with multiple precision options (2-bit to 8-bit)
11+
- **Chunk-based processing** for handling large models
12+
- **Comprehensive benchmarking** tools
13+
- **Detailed progress tracking** with memory statistics
14+
- **Easy model export** and deployment
1915

2016
## 🎯 Key Features
2117

2218
| Feature | Description |
2319
|----------------------------------|-------------|
24-
| ✅ Quantized Model Loading | Load HuggingFace models with various quantization techniques (including AWQ, GPTQ, GGUF) in 4-bit or 8-bit precision, featuring customizable settings. |
25-
| ✅ Advanced Dataset Management | Load, preprocess, and split datasets with flexible configurations |
26-
| ✅ LoRA / QLoRA Fine-Tuning | Memory-efficient fine-tuning with customizable LoRA parameters |
27-
| ✅ Comprehensive Training | Advanced training loop with mixed precision, gradient accumulation, and early stopping |
28-
| ✅ Model Evaluation | Flexible evaluation with custom metrics and batch processing |
29-
| ✅ Checkpoint Management | Save, resume, and manage training checkpoints with versioning |
30-
| ✅ Hub Integration | Push models and checkpoints to Hugging Face Hub with authentication |
31-
| ✅ Configuration Management | YAML/JSON config support for reproducible experiments |
32-
| ✅ Logging and Monitoring | Comprehensive logging and Weights & Biases integration |
20+
| ✅ Multiple GGUF Types | Support for various GGUF quantization types (Q2_K to Q8_0) with different precision-size tradeoffs |
21+
| ✅ Memory Optimization | Chunk-based processing and CPU offloading for efficient handling of large models |
22+
| ✅ Progress Tracking | Detailed layer-wise progress with memory statistics and ETA |
23+
| ✅ Benchmarking Tools | Comprehensive benchmarking suite for performance evaluation |
24+
| ✅ Hardware Optimization | Automatic device selection and memory management |
25+
| ✅ Easy Deployment | Simple conversion to GGUF format for deployment |
26+
| ✅ Flexible Configuration | Customizable quantization parameters and processing options |
3327

3428
## 🚀 Getting Started
3529

3630
### Installation
3731

32+
Basic installation:
3833
```bash
3934
pip install quantllm
4035
```
4136

37+
With GGUF support (recommended):
38+
```bash
39+
pip install quantllm[gguf]
40+
```
41+
42+
### Quick Example
43+
44+
```python
45+
from quantllm import QuantLLM
46+
from transformers import AutoTokenizer
47+
48+
# Load tokenizer and prepare data
49+
model_name = "facebook/opt-125m"
50+
tokenizer = AutoTokenizer.from_pretrained(model_name)
51+
calibration_text = ["Example text for calibration."] * 10
52+
calibration_data = tokenizer(calibration_text, return_tensors="pt", padding=True)["input_ids"]
53+
54+
# Quantize model
55+
quantized_model, benchmark_results = QuantLLM.quantize_from_pretrained(
56+
model_name_or_path=model_name,
57+
bits=4, # Quantization bits (2-8)
58+
group_size=32, # Group size for quantization
59+
quant_type="Q4_K_M", # GGUF quantization type
60+
calibration_data=calibration_data,
61+
benchmark=True, # Run benchmarks
62+
benchmark_input_shape=(1, 32)
63+
)
64+
65+
# Save and convert to GGUF
66+
QuantLLM.save_quantized_model(model=quantized_model, output_path="quantized_model")
67+
QuantLLM.convert_to_gguf(model=quantized_model, output_path="model.gguf")
68+
```
69+
4270
For detailed usage examples and API documentation, please refer to our:
4371
- 📚 [Official Documentation](https://quantllm.readthedocs.io/)
4472
- 🎓 [Tutorials](https://quantllm.readthedocs.io/tutorials/)
@@ -48,39 +76,41 @@ For detailed usage examples and API documentation, please refer to our:
4876

4977
### Minimum Requirements
5078
- **CPU**: 4+ cores
51-
- **RAM**: 16GB
52-
- **Storage**: 20GB free space
53-
- **Python**: 3.8+
79+
- **RAM**: 16GB+
80+
- **Storage**: 10GB+ free space
81+
- **Python**: 3.10+
5482

55-
### Recommended Requirements
83+
### Recommended for Large Models
84+
- **CPU**: 8+ cores
85+
- **RAM**: 32GB+
5686
- **GPU**: NVIDIA GPU with 8GB+ VRAM
57-
- **RAM**: 32GB
58-
- **Storage**: 50GB+ SSD
5987
- **CUDA**: 11.7+
88+
- **Storage**: 20GB+ free space
89+
90+
### GGUF Quantization Types
6091

61-
### Resource Usage Guidelines
62-
| Model Size | 4-bit (GPU RAM) | 8-bit (GPU RAM) | CPU RAM (min) |
63-
|------------|----------------|-----------------|---------------|
64-
| 3B params | ~6GB | ~9GB | 16GB |
65-
| 7B params | ~12GB | ~18GB | 32GB |
66-
| 13B params | ~20GB | ~32GB | 64GB |
67-
| 70B params | ~90GB | ~140GB | 256GB |
92+
| Type | Bits | Description | Use Case |
93+
|---------|------|-----------------------|-----------------------------|
94+
| Q2_K | 2 | Extreme compression | Size-critical deployment |
95+
| Q3_K_S | 3 | Small size | Limited storage |
96+
| Q4_K_M | 4 | Balanced quality | General use |
97+
| Q5_K_M | 5 | Higher quality | Quality-sensitive tasks |
98+
| Q8_0 | 8 | Best quality | Accuracy-critical tasks |
6899

69100
## 🔄 Version Compatibility
70101

71102
| QuantLLM | Python | PyTorch | Transformers | CUDA |
72103
|----------|--------|----------|--------------|-------|
73-
| latest | ≥3.10 | ≥2.0.0 | ≥4.30.0 | ≥11.7 |
104+
| 1.2.0 | ≥3.10 | ≥2.0.0 | ≥4.30.0 | ≥11.7 |
74105

75106
## 🗺 Roadmap
76107

77-
- [ ] Multi-GPU training support
78-
- [ ] AutoML for hyperparameter tuning
79-
- [ ] Integration of additional advanced quantization algorithms and techniques.
80-
- [ ] Custom model architecture support
81-
- [ ] Enhanced logging and visualization
82-
- [ ] Model compression techniques
83-
- [ ] Deployment optimizations
108+
- [ ] Support for more GGUF model architectures
109+
- [ ] Enhanced benchmarking capabilities
110+
- [ ] Multi-GPU processing support
111+
- [ ] Advanced memory optimization techniques
112+
- [ ] Integration with more deployment platforms
113+
- [ ] Custom quantization kernels
84114

85115
## 🤝 Contributing
86116

@@ -92,14 +122,12 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
92122

93123
## 🙏 Acknowledgments
94124

95-
- [HuggingFace](https://huggingface.co/) for their amazing Transformers library
96-
- [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for quantization
97-
- [PEFT](https://github.com/huggingface/peft) for parameter-efficient fine-tuning
98-
- [Weights & Biases](https://wandb.ai/) for experiment tracking
125+
- [llama.cpp](https://github.com/ggerganov/llama.cpp) for GGUF format
126+
- [HuggingFace](https://huggingface.co/) for Transformers library
127+
- [CTransformers](https://github.com/marella/ctransformers) for GGUF support
99128

100129
## 📫 Contact & Support
101130

102-
- GitHub Issues: [Create an issue](https://github.com/yourusername/QuantLLM/issues)
131+
- GitHub Issues: [Create an issue](https://github.com/codewithdark-git/QuantLLM/issues)
103132
- Documentation: [Read the docs](https://quantllm.readthedocs.io/)
104-
- Discord: [Join our community](https://discord.gg/quantllm)
105-
133+

docs/api_reference/model.rst

Lines changed: 0 additions & 76 deletions
This file was deleted.

0 commit comments

Comments
 (0)