Fsiino/pipecleaning rl envs by fsiino-nvidia · Pull Request #744 · NVIDIA-NeMo/Gym

fsiino-nvidia · 2026-02-20T23:54:49Z

No description provided.

copy-pr-bot · 2026-02-20T23:54:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot · 2026-02-21T01:59:13Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Adds `ng_pip_list` command to see the underlying uv pip list of the specified environment. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

This implements the `ng_status` command to list all running servers on the system and ping for health check. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

function calling resources server based on https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

need to set uv pip install python flag in colab environments when launching servers usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true ` defaults to false For #370 Needed for notebook here: https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

adds a section for single-step training with unsloth and trl not sure if these should be broken into separate sections. Left as one since the same notebook works for both, but could be confusing. not sure if we should also add more info about multi-step (hopefully) coming soon. Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

remove trl from docs, leaving just unsloth. was unclear that they are together. will make a trl section when we have a standalone trl notebook, or a section on trl's docs too. --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079 Initially in #290 , the `response_class=PlainTextResponse` was added to the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt to debug parsing server info for the `ng_status` command. This lead to a parsing error in `load_from_global_config`. This command now uses it's own separate endpoint `server_instances`, so this needs to be removed. Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Overall coverage failure threhsold is 95%, and test coverage is too low for train_data_utils which brings down overall coverage of the ng_dev_test suite. This covers some of those lingering test cases to bring it from 89% to 97%. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

This PR enables running Gym on Aviary environments. The two main concepts: - `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and manages multiple environments - Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs, but an integer index into the `TaskDataset`. Otherwise we'd have data defined in two places - Instead of tool-specific endpoints, we have one `/step` endpoint. This is because: - Aviary environments define their transition function in `step()`. Simply calling the bare tools can have undefined behavior (e.g. state isn't updated properly) - Aviary tools are not guaranteed to be available until `reset()` is called. - A `/close` endpoint is added to tear down resources - `AviaryAgent`: analogous to `SimpleAgent`, but: - Request is an integer index (which is forwarded to `AviaryResourcesServer`). In general, we expect `env.reset()` to provide the first messages, not the calling code - All tool calls are sent to `/step` - We rely on the environment to tell us when we're done Two concrete Aviary datasets/environments are integrated: GSM8k with a calculator environment and BixBench with a notebook environment. Adding new ones is pretty lightweight (most of the code in `notebook_app.py` is from defining a BixBench-compatible environment, not the integration). --------- Signed-off-by: Siddharth Narayanan <sid@futurehouse.org> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com> Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Adds more descriptive readme, reward profiling, and option for fractional or binary reward. Signed-off-by: abukharin-nv <abukharin@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

This PR adds new environments for SWE tasks. The environments can be used for single-step patch generation, test generation, and LLM-as-a-judge. They have been tested for instances from SWE-bench, SWE-Gym, and SWE-rebench. Patch and test generation environment runs them against unittests in a containerized environment (Singularity). --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Integrating a new dataset using existing equivalency llm judge resource server. Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash License: https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE Train: 8040 unique samples Validation: 50 unique, randomly sampled from train Augmentation on the source (minimal): Added system prompt, output formatting requirement Example of env validation: - base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint) - Step 30 -> 12.50% on Terminal Bench Core - https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm Train: nl2bash-super-train-0901.jsonl Validation: nl2bash-super-validation-0901.jsonl https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/ ``` ng_download_dataset_from_gitlab \ +dataset_name=nl2bash-equivalency-judge \ +version=0.0.1 \ +artifact_fpath=nl2bash-super-train-0901.jsonl \ +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl ``` --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

# Make `agent_name` optional in CLI rollout collection ## Summary Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to use `agent_ref` from each data row instead. ## Motivation The NeMo-RL training code already respects per-row `agent_ref`, but the Gym CLI (`ng_collect_rollouts`) required a single hardcoded `agent_name`. This prevented multi-agent rollout collection via CLI. ## Changes - `rollout_collection.py`: Made `agent_name` field optional with `default=None` - Use `config.agent_name` if specified; otherwise fall back to `row["agent_ref"]["name"]` - Added validation error if neither source provides an agent name ## Behavior | Before | After | |--------|-------| | `+agent_name=...` required | `+agent_name=...` optional | | All rows use same agent | Rows can use different agents via `agent_ref` | --------- Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

@pjin-nvidia

Inspired by https://github.com/NVIDIA-NeMo/Gym/pull/318/files#diff-b56c7f31b7793b3a4ac265f84f4c84216f1ed15a3fbee855da9674a7da8714ff by @pjin-nvidia --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

The default artifact paths for the math_with_judge resource server doesn't match the filenames for the provided dataset (nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main). This results in an error when attempting to download the files automatically from Hugging Face. The artifact paths for both training and validation need to be updated with the names as shown on Hugging Face for proper downloading. Signed-off-by: Robert Clark <roclark@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

The competitive coding resource config is missing a Hugging Face identifier which prevents it from being downloaded via Hugging Face using the data preparation tools. Without the HF identifier run the following: ``` config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml" ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface ``` This will throw a warning: ``` Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend ``` And eventually this error: ``` Traceback (most recent call last): File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module> sys.exit(prepare_data()) ^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data data_processor.run(global_config_dict) File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics state = self._validate_samples_and_aggregate_metrics_single_dataset(d) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines with open(dataset_config.jsonl_fpath) as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl' ``` This fix will download the validation file as intended and resolve the errors. Signed-off-by: Robert Clark <roclark@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

The train and val data paths are swapped in the config. This PR updates them. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

# PR: Add ns_tools Resources Server ## Description Adds a new resources server that integrates NeMo Skills tools (e.g., stateful Python code execution) with NeMo Gym's verification system. **Key features:** - Executes NeMo Skills tools via the ToolManager (e.g., `stateful_python_code_exec`) - Delegates verification to other resources servers (e.g., `math_with_judge`) ## Verifier Delegation The `ns_tools` server acts as a pass-through for verification. When `verify()` is called, it delegates to the configured verifier (default: `math_with_judge`): ``` ns_tools.verify(request) → POST to math_with_judge/verify → returns reward from math_with_judge ``` This allows using NeMo Skills tools while leveraging existing verification infrastructure. ## Example Data Format ```json { "id": "aime25-0", "question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.", "expected_answer": "70", "verifier_type": "math_with_judge", "agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"}, "responses_create_params": { "input": [ {"role": "user", "content": "Solve the following math problem..."} ], "tools": [{ "type": "function", "name": "stateful_python_code_exec", "description": "Execute Python code in a stateful environment.", "parameters": { "type": "object", "properties": {"code": {"type": "string"}}, "required": ["code"] } }] } } ``` --------- Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

## Summary - Adds new `math_formal_lean` resource server for Lean4 formal theorem proving - Implements `/verify` endpoint that compiles proofs via sandbox container and returns reward 1.0/0.0 - Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned prompt format - Comprehensive test suite (31 tests) ## Components | File | Description | |------|-------------| | `app.py` | Resource server with verify endpoint | | `sandbox_client.py` | HTTP client for Lean4 sandbox | | `proof_utils.py` | Proof extraction/building utilities | | `prepare_minif2f.py` | Dataset preparation script | | `README.md` | Documentation with licensing info | ## Test plan - [x] Unit tests pass (31/31) - [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5 samples) - [x] Tested with gpt-5.1-codex-max model - [x] Pre-commit lint checks pass 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Per title. This PR retains the current default of returning transitions, but it is reasonable to change that default to match the other Gym agents. Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Refactoring the equivalency llm judge resource server into another judge-based resource server. Main changes include removing regex logic and cleaning up related configs to that. Train data for this environment is still TBD, but a working version: Data source: Sliced terminus prompts from different sources train_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl` validation_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl` example train config: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml` Example of env validation: base model: early sft checkpoint of nano v3 (`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`) Step 50 -> 21.25% on Terminal Bench Core https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi Next steps: Will expand this PR with configurable verification options including string matching, string similarity and openapi-based output schema validation. --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Added new doc directories/article stubs for the topics identified in 0.2.0 IA. generated initial pass of structure and some starter content. This will enable contributors to focus more on the topic itself rather than the site build/toctree elements. **Feel free to blow away any initial content in these pages**. All stubbed pages have been marked with 🟡 in the toctree for easy discovery. remove 🟡 once the page is finished. <img width="1800" height="1009" alt="image" src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db" /> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Added a complete example of preparing a custom dataset for usage with NeMo Gym. The tutorial walks through downloading a dataset from Hugging Face or modifying from a different source, adding the "responses_create_params" field, writing a new resource server config, and preparing the data with "ng_prepare_data". This tutorial can be used as a guide for taking most arbitrary text-based datasets and modifying them to a format that is compatible with NeMo Gym for post-training. Signed-off-by: Robert Clark <roclark@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

enables using environments hub envs in NeMo Gym with NeMo RL for training. #446 --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

## Summary - Adds `.claude/skills/add-benchmark/` with a guided workflow for contributing new benchmarks and training environments - Covers the full lifecycle: scaffolding, data preparation, `verify()` implementation, YAML config, testing, reward profiling, and PR submission - Includes `references/patterns.md` with code templates for resource servers, agents, Ray subprocess execution, external tool auto-install, and dataset registry workflows - All content is generic (no benchmark-specific references) ## Test plan - [x] Verify skill files render correctly on GitHub - [x] Spot-check code patterns against existing resource servers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Jeff Farris <jfarris@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

## Summary - Adds `CLAUDE.md` to provide project context for Claude Code sessions - Covers architecture, CLI commands, configuration patterns, JSONL data schema, benchmark contribution workflow, code style, async patterns, external tool auto-install, and cluster gotchas - All content is generic (no benchmark-specific references) ## Test plan - [x] Verify CLAUDE.md renders correctly on GitHub - [x] Spot-check CLI commands against `pyproject.toml` entry points 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Jeff Farris <jfarris@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

…llection Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

…r handling, simplified judge prompt Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

…racted answer for judge, add truncation and warmup support Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

…er normalization, numeric fallback, max_steps limit Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

…l-envs Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

…l-envs Signed-off-by: Frankie Siino <fsiino@nvidia.com> # Conflicts: # resources_servers/math_with_judge/configs/math_with_judge.yaml

fsiino-nvidia force-pushed the fsiino/pipecleaning-rl-envs branch from d7e72a1 to 9d5845a Compare February 21, 2026 01:59

fsiino-nvidia and others added 27 commits February 20, 2026 20:19

Debug server package versions (#406)

96da7c5

Adds `ng_pip_list` command to see the underlying uv pip list of the specified environment. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

List running server health and status (#290)

bbb4f1a

This implements the `ng_status` command to list all running servers on the system and ping for health check. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

VLLMModel supports chat template kwargs (#538)

71b3dbd

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

ng_dump_config sanity removes API key values (#567)

cb3bb59

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

FastAPI worker support (#566)

f58e505

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

updating swerl_gen config (#588)

a8e4db3

The train and val data paths are swapped in the config. This PR updates them. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

openhands (#343)

f2edf75

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

cmunley1 and others added 28 commits February 20, 2026 20:19

docs: Fix "Prepare and Validate Data" command (#730)

d5e17cd

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

docs: Fix broken link (#731)

590a93d

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

docs: Fix typo (#732)

3b6f547

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Bump Pillow >= 12.1.1 (#739)

2bddb84

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

feat: Add option for seeding on num repeats (#740)

2d34d8a

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

feat: ng_run Dry run support (#743)

98f8f8c

Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

calendar, structured_outputs, workplace_assistant

e910e67

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Add missing mcqa hf validation

c8ec1ad

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Fix crashing from null option values

c786009

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

equivalence llm judge changes

52af273

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Add preprocess script for rl

399d073

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Remove stale run configs

1e47077

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Prevent individual http call failure from crashing rest of rollout co…

99bd347

…llection Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

More stale removals

4d49050

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Revert metrics

37619d1

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Revert debugging logic

0f3787d

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Improve equivalence_llm_judge robustness: truncation, validation erro…

f4f8d84

…r handling, simplified judge prompt Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Fix math_with_judge answer parsing: strip math delimiters, prefer ext…

815164d

…racted answer for judge, add truncation and warmup support Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Fix math_with_code answer extraction: brace-depth boxed parsing, answ…

5bc7725

…er normalization, numeric fallback, max_steps limit Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

revert math_with_judge warmup and reward inflation logic

9b8b711

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

add explicit warning print for judge response validation fallback

9e58502

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

update equivalence judge prompt output format wording

23276f0

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

tighten math_with_code agent max_steps to reduce stragglers

b7c1eeb

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Merge remote-tracking branch 'github/main' into fsiino/pipecleaning-r…

e8d9b22

…l-envs Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Linting

d4efed2

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

fsiino-nvidia force-pushed the fsiino/pipecleaning-rl-envs branch from 3c21c09 to d4efed2 Compare February 21, 2026 04:23

Merge remote-tracking branch 'github/main' into fsiino/pipecleaning-r…

e5d13a2

…l-envs Signed-off-by: Frankie Siino <fsiino@nvidia.com> # Conflicts: # resources_servers/math_with_judge/configs/math_with_judge.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fsiino/pipecleaning rl envs#744

Fsiino/pipecleaning rl envs#744
fsiino-nvidia wants to merge 108 commits intomainfrom
fsiino/pipecleaning-rl-envs

fsiino-nvidia commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

fsiino-nvidia commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 20, 2026

Uh oh!

copy-pr-bot bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants