Releases: OpenPipe/ART
v0.4.4
ART 0.4.4 Release Notes
New Features
- SkyPilot Integration Enhancement: Added SkyPilot extras support for improved cloud deployment
capabilities (#255) - Reward System Improvements: Added experimental support to not scale rewards, providing more
flexibility in reward configuration (#fd2a118)
Documentation & Examples
- New Tutorial: Added temporal-clue-7b.ipynb notebook demonstrating temporal reasoning capabilities
(#a2802b3) - Enhanced Documentation: Updated RULER documentation with comprehensive guidance on combining
rewards (#250) - ART•E Integration: Added ART•E notebook examples to documentation (#242, #240)
Bug Fixes & Improvements
- Dependency Management: Reverted to previous version of gql to resolve compatibility issues (#249)
- Unsloth Integration: Added experimental logprob pre-calculation support for Unsloth services
- Installation Fixes: Improved backend dependency installation when using local ART paths (#241)
- Documentation Updates: Various minor documentation improvements and clarifications
Technical Improvements
- Updated SkyPilot backend installation instructions
- Removed obsolete numpy installation cells from quickstart examples
- Enhanced dependency synchronization
v0.4.3
ART 0.4.3 Release Notes
Breaking Changes
SkyPilot is now an optional dependency. If you use SkyPilotBackend
, you must now install ART with the skypilot extra:
# Before (no longer works)
pip install openpipe-art
# Now required for SkyPilotBackend users
pip install openpipe-art[skypilot]
What's Changed
Dependencies
- Moved SkyPilot dependencies (
semver>=3.0.4
andskypilot==0.9.3
) to an optional dependency group[skypilot]
(#235) - This reduces the default installation size for users who don't need SkyPilot functionality
Documentation Updates
- Updated installation instructions in all relevant documentation:
- Installation + Setup guide
- ART Backend documentation
- Summarizer tutorial
Migration Guide
If you're using SkyPilotBackend
in your code:
# Your existing code doesn't need to change, just update the installation
from art.skypilot import SkyPilotBackend
backend = SkyPilotBackend(...)
Simply install with: pip install openpipe-art[skypilot]
or uv add openpipe-art[skypilot]
Full Changelog
See PR #235 for complete details.
v0.4.2
What's Changed
- Fix client import error by vendoring transformers constants (#232)
- docs: Add comprehensive documentation for additional_histories feature (#231)
- Fix Ruff lint (#229)
- Update 2048 code to use RULER (#228)
- Add RULER notebook for 2048 (#227)
- Add RULER promotional snippet to README (#225)
- Add run_checks.sh script for code quality checks (#224)
- Update README (#223)
- fix python version in art-e (#222)
- ruler docs (#221)
- feat: Decouple vLLM & Unsloth Trainer (#212)
Full Changelog: v0.4.0...v0.4.2
v0.4.1
What's Changed
- Fix client import error by vendoring transformers constants (#232)
- Fix Ruff lint (#229)
- Update 2048 code to use RULER (#228)
- Add RULER notebook for 2048 (#227)
- Add RULER promotional snippet to README (#225)
- Add run_checks.sh script for code quality checks (#224)
- Update README (#223)
- fix python version in art-e (#222)
- ruler docs (#221)
- feat: Decouple vLLM & Unsloth Trainer (#212)
Full Changelog: v0.4.0...v0.4.1
v0.4.0
🚀 Introducing RULER: Relative Universal LLM-Elicited Rewards
We're excited to announce ART v0.4.0, featuring RULER - a groundbreaking general-purpose reward function that makes agent training dramatically easier and faster!
📏 What is RULER?
RULER (Relative Universal LLM-Elicited Rewards) uses an LLM-as-judge to rank agent trajectories, eliminating the need for:
- ❌ Labeled training data
- ❌ Expert feedback
- ❌ Hand-crafted reward functions
Yet it often matches or exceeds the performance of carefully designed reward functions!
🎯 Key Benefits
- 2-3x faster development: Skip the tedious reward engineering phase
- Universal application: Works across diverse RL tasks without modification
- Production-ready: Battle-tested on real tasks with impressive results
- Simple integration: Just a few lines of code to get started
📖 Learn More
Check out the RULER documentation to see how easy it is to use:
from art.rewards import ruler_score_group
# Score your trajectories with one line
judged_group = await ruler_score_group(group, "openai/gpt-4o-mini")
Read the full launch announcement for detailed performance comparisons and insights.
What's Changed
Major Features
Other Improvements
- Update README (#223)
- fix python version in art-e (#222)
- Add setproctitle as dep in colab notebooks (#220)
- Move plotting dependencies to optional group (#217)
- feat: tau-bench brad 003 (#216)
- Allow validation_loader argument to train method (#215)
- Update swe-bench example docs (#214)
- chore: Remove workaround for torch-compile and use --torch-compile flag (#212)
- Adds option to use padding with --torch-compile (#211)
- Fix tau-bench example (#210)
- art-2048: update qwen model identifier (#209)
- Allow Unsloth to use --pad_token when tokenizer has no pad token (#208)
- Allow using specific wandb projects in the CLI (#207) (#207)
- feat: Allow using
get_peft_model
to re-initialize trainer state (#206) - chore: Add
art_trainer
module with ART's TRL Trainer (#205) - Art tau bench example (#204)
- 🔊 Improve noisy startup (#203) (#203)
- feat: SWE-Bench Example (#201)
- Update to 0.3.13, pin accelerate (#197)
Full Changelog: v0.3.13...v0.4.0
EOF < /dev/null
Release v0.3.13
What's Changed
- chore: Update TRL (#187)
- allow training without logprobs experimentation (#186)
- chore: Upgrade Unsloth dependencies (#183)
- chore: SWE-Bench related changes (#181)
- Bump uv to >=0.6.15 (#180)
- Tau bench async rl (#179)
- Track entropy at training time (#178)
- Adds vllm metrics to wandb (#177)
- Pin openpipe-art and accelerate versions in notebooks (#175)
- Release 0.3.12 (#174)
- Match default SkyPilotBackend version to client (#173)
- feat: Add support for multiple histories (#170)
- [WIP] More docs (#169)
Full Changelog: v0.3.12...v0.3.13
Release v0.3.12
What's Changed
- Tau bench async (#168)
- Refactor dev/tau-bench for true async (#167)
- ART-E updates (#166)
- Add langfuse tracing to run_rl.py (#165)
- Make rollout_tau_bench_task synchronous (#164)
- feat: Multi-device training (#163)
- Create run_training.py for remote training (#162)
- Create run_rl.py with ART RL loop (#161)
- Wandb weave (#158)
- Basic W&B Weave integration (#157)
- Properly read base model from CLI (#156)
- Deploy model locally (#155)
- Fix s3 utils typo (#153)
- Fix busy wait in vllm test client (#152)
- Fix comment (#151)
- dev: swebench (#149)
- fix: Improve retry util typing (#148)
- Add
get_guided_completion_params
and use in tic tac toe self play (#147) - Pin vllm to 0.8.5 (#146)
Full Changelog: v0.3.11...v0.3.12
Release v0.3.11
What's Changed
- Limit number of metrics shown in
gather_trajectory_groups
(#145)
Full Changelog: v0.3.10...v0.3.11
Release v0.3.10
What's Changed
- Update package version to
0.3.9
(#143) - Speed up step deployment (#142)
- Add tic tac toe self-play example (#141)
- Fix training stability issues with new vLLM version (#140)
Full Changelog: v0.3.9...v0.3.10
Release v0.3.9
What's Changed
- Serialize model config (#139)
- Properly log trajectories and metrics via remote backend (#138)
- Properly sync workdir (#137)
- reverting to older vllm version, since the latest one shows regressions in convergence of grpo training (#136)
- feat: Add force_restart option to SkypilotBackend (#135)
- Fix train step (#134)
- Revert some of the thread safe changes (#133)
- Enable asymmetric PPO clipping (#132)
- Support
close
method on remote backend (#129) - Fix hanging (#126)
- update version (#122)
- Update to version
0.3.6
(#121) - Simplify tic tac toe example (#120)
- add qwen 3 support (#118)
Full Changelog: v0.3.7...v0.3.9