RULER (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—no labeled data, expert feedback, or reward engineering required.
✨ Key Benefits:
- 2-3x faster development - Skip reward function engineering entirely
- General-purpose - Works across any task without modification
- Strong performance - Matches or exceeds hand-crafted rewards in 3/4 benchmarks
- Easy integration - Drop-in replacement for manual reward functions
# Before: Hours of reward engineering
def complex_reward_function(trajectory):
    # 50+ lines of careful scoring logic...
    pass
# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")ART is an open-source RL framework that improves agent reliability by allowing LLMs to learn from experience. ART provides an ergonomic harness for integrating GRPO into any python application. For a quick hands-on introduction, run one of the notebooks below. When you're ready to learn more, check out the docs.
| Agent Task | Example Notebook | Description | Comparative Performance | 
|---|---|---|---|
| ART•E [RULER] | 🏋️ Train agent | Qwen 2.5 7B learns to search emails using RULER | |
| 2048 | 🏋️ Train agent | Qwen 2.5 3B learns to play 2048 | |
| Temporal Clue | 🏋️ Train agent | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] | 
| Tic Tac Toe | 🏋️ Train agent | Qwen 2.5 3B learns to play Tic Tac Toe | |
| Codenames | 🏋️ Train agent | Qwen 2.5 3B learns to play Codenames |  benchmarks | 
| AutoRL [RULER] | 🏋️ Train agent | Train Qwen 2.5 7B to master any task | [Link coming soon] | 
Explore our latest research and updates on building SOTA agents.
- 🗞️ AutoRL: Zero-Data Training for Any Task - Train custom AI models without labeled data using automatic input generation and RULER evaluation.
- 🗞️ RULER: Easy Mode for RL Rewards is now available for automatic reward generation in reinforcement learning.
- 🗞️ ART·E: How We Built an Email Research Agent That Beats o3 demonstrates a Qwen 2.5 14B email agent outperforming OpenAI's o3.
- 🗞️ ART Trainer: A New RL Trainer for Agents enables easy training of LLM-based agents using GRPO.
- ART provides convenient wrappers for introducing RL training into existing applications. We abstract the training server into a modular service that your code doesn't need to interface with.
- Train from anywhere. Run the ART client on your laptop and let the ART server kick off an ephemeral GPU-enabled environment, or run on a local GPU.
- Integrations with hosted platforms like W&B, Langfuse, and OpenPipe provide flexible observability and simplify debugging.
- ART is customizable with intelligent defaults. You can configure training parameters and inference engine configurations to meet specific needs, or take advantage of the defaults, which have been optimized for training efficiency and stability.
ART agents can be trained from any client machine that runs python. To add to an existing project, run this command:
pip install openpipe-art
Curious about how to use ART for a real-world task? Check out the ART•E Agent blog post, where we detail how we trained Qwen 2.5 14B to beat o3 at email retrieval!
ART's functionality is divided into a client and a server. The OpenAI-compatible client is responsible for interfacing between ART and your codebase. Using the client, you can pass messages and get completions from your LLM as it improves. The server runs independently on any machine with a GPU. It abstracts away the complexity of the inference and training portions of the RL loop while allowing for some custom configuration. An outline of the training loop is shown below:
- 
Inference - Your code uses the ART client to perform an agentic workflow (usually executing several rollouts in parallel to gather data faster).
- Completion requests are routed to the ART server, which runs the model's latest LoRA in vLLM.
- As the agent executes, each system,user, andassistantmessage is stored in a Trajectory.
- When a rollout finishes, your code assigns a rewardto its Trajectory, indicating the performance of the LLM.
 
- 
Training - When each rollout has finished, Trajectories are grouped and sent to the server. Inference is blocked while training executes.
- The server trains your model using GRPO, initializing from the latest checkpoint (or an empty LoRA on the first iteration).
- The server saves the newly trained LoRA to a local directory and loads it into vLLM.
- Inference is unblocked and the loop resumes at step 1.
 
This training loop runs until a specified number of inference and training iterations have completed.
ART should work with most vLLM/HuggingFace-transformers compatible causal language models, or at least the ones supported by Unsloth. Gemma 3 does not appear to be supported for the time being. If any other model isn't working for you, please let us know on Discord or open an issue on GitHub!
ART is in active development, and contributions are most welcome! Please see the CONTRIBUTING.md file for more information.
@misc{hilton2025art,
  author = {Brad Hilton and Kyle Corbitt and David Corbitt and Saumya Gandhi and Angky William and Bohdan Kovalenskyi and Andie Jones},
  title = {ART: Agent Reinforcement Trainer},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/openpipe/art}}
}This repository's source code is available under the Apache-2.0 License.
ART stands on the shoulders of giants. While we owe many of the ideas and early experiments that led to ART's development to the open source RL community at large, we're especially grateful to the authors of the following projects:
Finally, thank you to our partners who've helped us test ART in the wild! We're excited to see what you all build with it.
 
