Skip to content

Releases: OpenPipe/ART

v0.4.4

17 Jul 07:26
Compare
Choose a tag to compare

ART 0.4.4 Release Notes

New Features

  • SkyPilot Integration Enhancement: Added SkyPilot extras support for improved cloud deployment
    capabilities (#255)
  • Reward System Improvements: Added experimental support to not scale rewards, providing more
    flexibility in reward configuration (#fd2a118)

Documentation & Examples

  • New Tutorial: Added temporal-clue-7b.ipynb notebook demonstrating temporal reasoning capabilities
    (#a2802b3)
  • Enhanced Documentation: Updated RULER documentation with comprehensive guidance on combining
    rewards (#250)
  • ART•E Integration: Added ART•E notebook examples to documentation (#242, #240)

Bug Fixes & Improvements

  • Dependency Management: Reverted to previous version of gql to resolve compatibility issues (#249)
  • Unsloth Integration: Added experimental logprob pre-calculation support for Unsloth services
  • Installation Fixes: Improved backend dependency installation when using local ART paths (#241)
  • Documentation Updates: Various minor documentation improvements and clarifications

Technical Improvements

  • Updated SkyPilot backend installation instructions
  • Removed obsolete numpy installation cells from quickstart examples
  • Enhanced dependency synchronization

v0.4.3

15 Jul 05:29
Compare
Choose a tag to compare

ART 0.4.3 Release Notes

Breaking Changes

SkyPilot is now an optional dependency. If you use SkyPilotBackend, you must now install ART with the skypilot extra:

# Before (no longer works)
pip install openpipe-art

# Now required for SkyPilotBackend users
pip install openpipe-art[skypilot]

What's Changed

Dependencies

  • Moved SkyPilot dependencies (semver>=3.0.4 and skypilot==0.9.3) to an optional dependency group [skypilot] (#235)
  • This reduces the default installation size for users who don't need SkyPilot functionality

Documentation Updates

  • Updated installation instructions in all relevant documentation:
    • Installation + Setup guide
    • ART Backend documentation
    • Summarizer tutorial

Migration Guide

If you're using SkyPilotBackend in your code:

# Your existing code doesn't need to change, just update the installation
from art.skypilot import SkyPilotBackend
backend = SkyPilotBackend(...)

Simply install with: pip install openpipe-art[skypilot] or uv add openpipe-art[skypilot]

Full Changelog

See PR #235 for complete details.

v0.4.2

14 Jul 21:30
Compare
Choose a tag to compare

What's Changed

  • Fix client import error by vendoring transformers constants (#232)
  • docs: Add comprehensive documentation for additional_histories feature (#231)
  • Fix Ruff lint (#229)
  • Update 2048 code to use RULER (#228)
  • Add RULER notebook for 2048 (#227)
  • Add RULER promotional snippet to README (#225)
  • Add run_checks.sh script for code quality checks (#224)
  • Update README (#223)
  • fix python version in art-e (#222)
  • ruler docs (#221)
  • feat: Decouple vLLM & Unsloth Trainer (#212)

Full Changelog: v0.4.0...v0.4.2

v0.4.1

14 Jul 18:50
Compare
Choose a tag to compare

What's Changed

  • Fix client import error by vendoring transformers constants (#232)
  • Fix Ruff lint (#229)
  • Update 2048 code to use RULER (#228)
  • Add RULER notebook for 2048 (#227)
  • Add RULER promotional snippet to README (#225)
  • Add run_checks.sh script for code quality checks (#224)
  • Update README (#223)
  • fix python version in art-e (#222)
  • ruler docs (#221)
  • feat: Decouple vLLM & Unsloth Trainer (#212)

Full Changelog: v0.4.0...v0.4.1

v0.4.0

11 Jul 06:34
Compare
Choose a tag to compare

🚀 Introducing RULER: Relative Universal LLM-Elicited Rewards

We're excited to announce ART v0.4.0, featuring RULER - a groundbreaking general-purpose reward function that makes agent training dramatically easier and faster!

📏 What is RULER?

RULER (Relative Universal LLM-Elicited Rewards) uses an LLM-as-judge to rank agent trajectories, eliminating the need for:

  • ❌ Labeled training data
  • ❌ Expert feedback
  • ❌ Hand-crafted reward functions

Yet it often matches or exceeds the performance of carefully designed reward functions!

🎯 Key Benefits

  • 2-3x faster development: Skip the tedious reward engineering phase
  • Universal application: Works across diverse RL tasks without modification
  • Production-ready: Battle-tested on real tasks with impressive results
  • Simple integration: Just a few lines of code to get started

📖 Learn More

Check out the RULER documentation to see how easy it is to use:

from art.rewards import ruler_score_group

# Score your trajectories with one line
judged_group = await ruler_score_group(group, "openai/gpt-4o-mini")

Read the full launch announcement for detailed performance comparisons and insights.

What's Changed

Major Features

  • Add RULER reward function (#218) 🎉
  • RULER documentation (#221)

Other Improvements

  • Update README (#223)
  • fix python version in art-e (#222)
  • Add setproctitle as dep in colab notebooks (#220)
  • Move plotting dependencies to optional group (#217)
  • feat: tau-bench brad 003 (#216)
  • Allow validation_loader argument to train method (#215)
  • Update swe-bench example docs (#214)
  • chore: Remove workaround for torch-compile and use --torch-compile flag (#212)
  • Adds option to use padding with --torch-compile (#211)
  • Fix tau-bench example (#210)
  • art-2048: update qwen model identifier (#209)
  • Allow Unsloth to use --pad_token when tokenizer has no pad token (#208)
  • Allow using specific wandb projects in the CLI (#207) (#207)
  • feat: Allow using get_peft_model to re-initialize trainer state (#206)
  • chore: Add art_trainer module with ART's TRL Trainer (#205)
  • Art tau bench example (#204)
  • 🔊 Improve noisy startup (#203) (#203)
  • feat: SWE-Bench Example (#201)
  • Update to 0.3.13, pin accelerate (#197)

Full Changelog: v0.3.13...v0.4.0
EOF < /dev/null

Release v0.3.13

11 Jul 06:27
c84c141
Compare
Choose a tag to compare

What's Changed

  • chore: Update TRL (#187)
  • allow training without logprobs experimentation (#186)
  • chore: Upgrade Unsloth dependencies (#183)
  • chore: SWE-Bench related changes (#181)
  • Bump uv to >=0.6.15 (#180)
  • Tau bench async rl (#179)
  • Track entropy at training time (#178)
  • Adds vllm metrics to wandb (#177)
  • Pin openpipe-art and accelerate versions in notebooks (#175)
  • Release 0.3.12 (#174)
  • Match default SkyPilotBackend version to client (#173)
  • feat: Add support for multiple histories (#170)
  • [WIP] More docs (#169)

Full Changelog: v0.3.12...v0.3.13

Release v0.3.12

11 Jul 06:26
7032d6c
Compare
Choose a tag to compare

What's Changed

  • Tau bench async (#168)
  • Refactor dev/tau-bench for true async (#167)
  • ART-E updates (#166)
  • Add langfuse tracing to run_rl.py (#165)
  • Make rollout_tau_bench_task synchronous (#164)
  • feat: Multi-device training (#163)
  • Create run_training.py for remote training (#162)
  • Create run_rl.py with ART RL loop (#161)
  • Wandb weave (#158)
  • Basic W&B Weave integration (#157)
  • Properly read base model from CLI (#156)
  • Deploy model locally (#155)
  • Fix s3 utils typo (#153)
  • Fix busy wait in vllm test client (#152)
  • Fix comment (#151)
  • dev: swebench (#149)
  • fix: Improve retry util typing (#148)
  • Add get_guided_completion_params and use in tic tac toe self play (#147)
  • Pin vllm to 0.8.5 (#146)

Full Changelog: v0.3.11...v0.3.12

Release v0.3.11

11 Jul 06:26
126fa27
Compare
Choose a tag to compare

What's Changed

  • Limit number of metrics shown in gather_trajectory_groups (#145)

Full Changelog: v0.3.10...v0.3.11

Release v0.3.10

11 Jul 06:26
f63ac00
Compare
Choose a tag to compare

What's Changed

  • Update package version to 0.3.9 (#143)
  • Speed up step deployment (#142)
  • Add tic tac toe self-play example (#141)
  • Fix training stability issues with new vLLM version (#140)

Full Changelog: v0.3.9...v0.3.10

Release v0.3.9

11 Jul 06:26
7fdfbeb
Compare
Choose a tag to compare

What's Changed

  • Serialize model config (#139)
  • Properly log trajectories and metrics via remote backend (#138)
  • Properly sync workdir (#137)
  • reverting to older vllm version, since the latest one shows regressions in convergence of grpo training (#136)
  • feat: Add force_restart option to SkypilotBackend (#135)
  • Fix train step (#134)
  • Revert some of the thread safe changes (#133)
  • Enable asymmetric PPO clipping (#132)
  • Support close method on remote backend (#129)
  • Fix hanging (#126)
  • update version (#122)
  • Update to version 0.3.6 (#121)
  • Simplify tic tac toe example (#120)
  • add qwen 3 support (#118)

Full Changelog: v0.3.7...v0.3.9