xTuring makes it simple, fast, and cost‑efficient to fine‑tune open‑source LLMs (e.g., GPT‑OSS, LLaMA/LLaMA 2, Falcon, Qwen3, GPT‑J, GPT‑2, OPT, Bloom, Cerebras, Galactica) on your own data — locally or in your private cloud.
Why xTuring:
- Simple API for data prep, training, and inference
- Private by default: run locally or in your VPC
- Efficient: LoRA and low‑precision (INT8/INT4) to cut costs
- Scales from CPU/laptop to multi‑GPU easily
- Evaluate models with built‑in metrics (e.g., perplexity)
pip install xturingRun a small, CPU‑friendly example first:
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
# Load a toy instruction dataset (Alpaca format)
dataset = InstructionDataset("./examples/models/llama/alpaca_data")
# Start small for quick iterations (works on CPU)
model = BaseModel.create("distilgpt2_lora")
# Fine‑tune and then generate
model.finetune(dataset=dataset)
output = model.generate(texts=["Explain quantum computing for beginners."])
print(f"Model output: {output}")Want bigger models and reasoning controls? Try GPT‑OSS variants (requires significant resources):
from xturing.models import BaseModel
# 120B or 20B variants; also support LoRA/INT8/INT4 configs
model = BaseModel.create("gpt_oss_20b_lora")You can find the data folder here.
Highlights from recent updates:
- GPT‑OSS integration – Use and fine‑tune
gpt_oss_120bandgpt_oss_20bwith off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 options. Includes configurable reasoning levels and harmony response format support.
from xturing.models import BaseModel
# Use the production-ready 120B model
model = BaseModel.create('gpt_oss_120b_lora')
# Or use the efficient 20B model for faster inference
model = BaseModel.create('gpt_oss_20b_lora')
# Both models support reasoning levels via system prompts- LLaMA 2 integration – Off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 via
GenericModelorLlama2.
from xturing.models import Llama2
model = Llama2()
## or
from xturing.models import BaseModel
model = BaseModel.create('llama2')- Evaluation – Evaluate any causal LM on any dataset. Currently supports
perplexity.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')
# Load the desired model (try GPT-OSS for advanced reasoning)
model = BaseModel.create('gpt_oss_20b')
# Run the Evaluation of the model on the dataset
result = model.evaluate(dataset)
# Print the result
print(f"Perplexity of the evalution: {result}")- INT4 precision – Fine‑tune many LLMs with INT4 using
GenericLoraKbitModel.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel
# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')
# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')
# Run the fine-tuning
model.finetune(dataset)- CPU inference – Run inference on CPUs (including laptops) via Intel® Extension for Transformers, using weight‑only quantization and optimized kernels on Intel platforms.
# Make the necessary imports
from xturing.models import BaseModel
# Initializes the model: quantize the model with weight-only algorithms
# and replace the linear with Itrex's qbits_linear kernel
model = BaseModel.create("llama2_int8")
# Once the model has been quantized, do inferences directly
output = model.generate(texts=["Why LLM models are becoming so important?"])
print(output)- Batching – Set
batch_sizein.generate()and.evaluate()to speed up processing.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel
# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')
# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')
# Generate outputs on desired prompts
outputs = model.generate(dataset = dataset, batch_size=10)- Qwen3 0.6B supervised fine-tuning – The lightweight Qwen3 0.6B checkpoint now has first-class support (registry, configs, docs, and examples) so you can launch SFT/LoRA jobs immediately.
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
dataset = InstructionDataset("./examples/models/llama/alpaca_data")
model = BaseModel.create("qwen3_0_6b_lora")
model.finetune(dataset=dataset)See
examples/models/qwen3/qwen3_lora_finetune.pyfor a runnable script.
An exploration of the Llama LoRA INT4 working example is recommended for an understanding of its application.
For an extended insight, consider examining the GenericModel working example available in the repository.
$ xturing chat -m "<path-to-model-folder>"
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
from xturing.ui import Playground
dataset = InstructionDataset("./alpaca_data")
model = BaseModel.create("<model_name>")
model.finetune(dataset=dataset)
model.save("llama_lora_finetuned")
Playground().launch() ## launches localhost UI- Preparing your dataset
- Cerebras-GPT fine-tuning with LoRA and INT8
- Cerebras-GPT fine-tuning with LoRA
- LLaMA fine-tuning with LoRA and INT8
- LLaMA fine-tuning with LoRA
- LLaMA fine-tuning
- GPT-J fine-tuning with LoRA and INT8
- GPT-J fine-tuning with LoRA
- GPT-2 fine-tuning with LoRA
Here is a comparison for the performance of different fine-tuning techniques on the LLaMA 7B model. We use the Alpaca dataset for fine-tuning. The dataset contains 52K instructions.
Hardware:
4xA100 40GB GPU, 335GB CPU RAM
Fine-tuning parameters:
{
'maximum sequence length': 512,
'batch size': 1,
}| LLaMA-7B | DeepSpeed + CPU Offloading | LoRA + DeepSpeed | LoRA + DeepSpeed + CPU Offloading |
|---|---|---|---|
| GPU | 33.5 GB | 23.7 GB | 21.9 GB |
| CPU | 190 GB | 10.2 GB | 14.9 GB |
| Time/epoch | 21 hours | 20 mins | 20 mins |
Contribute to this by submitting your performance results on other GPUs by creating an issue with your hardware specifications, memory consumption and time per epoch.
We have already fine-tuned some models that you can use as your base or start playing with. Here is how you would load them:
from xturing.models import BaseModel
model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")| model | dataset | Path |
|---|---|---|
| DistilGPT-2 LoRA | alpaca | x/distilgpt2_lora_finetuned_alpaca |
| LLaMA LoRA | alpaca | x/llama_lora_finetuned_alpaca |
Below is a list of all the supported models via BaseModel class of xTuring and their corresponding keys to load them.
| Model | Key |
|---|---|
| Bloom | bloom |
| Cerebras | cerebras |
| DistilGPT-2 | distilgpt2 |
| Falcon-7B | falcon |
| Galactica | galactica |
| GPT-OSS (20B/120B) | gpt_oss_20b, gpt_oss_120b |
| GPT-J | gptj |
| GPT-2 | gpt2 |
| LLaMA | llama |
| LLaMA2 | llama2 |
| MiniMaxM2 | minimax_m2 |
| OPT-1.3B | opt |
The above are the base variants. Use these templates for LoRA, INT8, and INT8 + LoRA versions:
| Version | Template |
|---|---|
| LoRA | <model_key>_lora |
| INT8 | <model_key>_int8 |
| INT8 + LoRA | <model_key>_lora_int8 |
To load a model’s INT4 + LoRA version, use the GenericLoraKbitModel class:
model = GenericLoraKbitModel('<model_path>')Replace <model_path> with a local directory or a Hugging Face model like facebook/opt-1.3b.
- Support for
LLaMA,GPT-J,GPT-2,OPT,Cerebras-GPT,GalacticaandBloommodels - Dataset generation using self-instruction
- Low-precision LoRA fine-tuning and unsupervised fine-tuning
- INT8 low-precision fine-tuning support
- OpenAI, Cohere, and Claude model APIs for dataset generation
- Added fine-tuned checkpoints for some models to the hub
- INT4 LLaMA LoRA fine-tuning demo
- INT4 LLaMA LoRA fine-tuning with INT4 generation
- Support for a
Generic modelwrapper - Support for
Falcon-7Bmodel - INT4 low-precision fine-tuning support
- Evaluation of LLM models
- INT3, INT2, INT1 low-precision fine-tuning support
- Support for Stable Diffusion
If you have any questions, you can create an issue on this repository.
You can also join our Discord server and start a discussion in the #xturing channel.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.

