Draft
Conversation
d7e72a1 to
9d5845a
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Adds `ng_pip_list` command to see the underlying uv pip list of the specified environment. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This implements the `ng_status` command to list all running servers on the system and ping for health check. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
function calling resources server based on https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
need to set uv pip install python flag in colab environments when launching servers usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true ` defaults to false For #370 Needed for notebook here: https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
adds a section for single-step training with unsloth and trl not sure if these should be broken into separate sections. Left as one since the same notebook works for both, but could be confusing. not sure if we should also add more info about multi-step (hopefully) coming soon. Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
remove trl from docs, leaving just unsloth. was unclear that they are together. will make a trl section when we have a standalone trl notebook, or a section on trl's docs too. --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079 Initially in #290 , the `response_class=PlainTextResponse` was added to the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt to debug parsing server info for the `ng_status` command. This lead to a parsing error in `load_from_global_config`. This command now uses it's own separate endpoint `server_instances`, so this needs to be removed. Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Overall coverage failure threhsold is 95%, and test coverage is too low for train_data_utils which brings down overall coverage of the ng_dev_test suite. This covers some of those lingering test cases to bring it from 89% to 97%. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This PR enables running Gym on Aviary environments. The two main concepts: - `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and manages multiple environments - Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs, but an integer index into the `TaskDataset`. Otherwise we'd have data defined in two places - Instead of tool-specific endpoints, we have one `/step` endpoint. This is because: - Aviary environments define their transition function in `step()`. Simply calling the bare tools can have undefined behavior (e.g. state isn't updated properly) - Aviary tools are not guaranteed to be available until `reset()` is called. - A `/close` endpoint is added to tear down resources - `AviaryAgent`: analogous to `SimpleAgent`, but: - Request is an integer index (which is forwarded to `AviaryResourcesServer`). In general, we expect `env.reset()` to provide the first messages, not the calling code - All tool calls are sent to `/step` - We rely on the environment to tell us when we're done Two concrete Aviary datasets/environments are integrated: GSM8k with a calculator environment and BixBench with a notebook environment. Adding new ones is pretty lightweight (most of the code in `notebook_app.py` is from defining a BixBench-compatible environment, not the integration). --------- Signed-off-by: Siddharth Narayanan <sid@futurehouse.org> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com> Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Adds more descriptive readme, reward profiling, and option for fractional or binary reward. Signed-off-by: abukharin-nv <abukharin@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This PR adds new environments for SWE tasks. The environments can be used for single-step patch generation, test generation, and LLM-as-a-judge. They have been tested for instances from SWE-bench, SWE-Gym, and SWE-rebench. Patch and test generation environment runs them against unittests in a containerized environment (Singularity). --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Integrating a new dataset using existing equivalency llm judge resource server. Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash License: https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE Train: 8040 unique samples Validation: 50 unique, randomly sampled from train Augmentation on the source (minimal): Added system prompt, output formatting requirement Example of env validation: - base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint) - Step 30 -> 12.50% on Terminal Bench Core - https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm Train: nl2bash-super-train-0901.jsonl Validation: nl2bash-super-validation-0901.jsonl https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/ ``` ng_download_dataset_from_gitlab \ +dataset_name=nl2bash-equivalency-judge \ +version=0.0.1 \ +artifact_fpath=nl2bash-super-train-0901.jsonl \ +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl ``` --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
# Make `agent_name` optional in CLI rollout collection ## Summary Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to use `agent_ref` from each data row instead. ## Motivation The NeMo-RL training code already respects per-row `agent_ref`, but the Gym CLI (`ng_collect_rollouts`) required a single hardcoded `agent_name`. This prevented multi-agent rollout collection via CLI. ## Changes - `rollout_collection.py`: Made `agent_name` field optional with `default=None` - Use `config.agent_name` if specified; otherwise fall back to `row["agent_ref"]["name"]` - Added validation error if neither source provides an agent name ## Behavior | Before | After | |--------|-------| | `+agent_name=...` required | `+agent_name=...` optional | | All rows use same agent | Rows can use different agents via `agent_ref` | --------- Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Inspired by https://github.com/NVIDIA-NeMo/Gym/pull/318/files#diff-b56c7f31b7793b3a4ac265f84f4c84216f1ed15a3fbee855da9674a7da8714ff by @pjin-nvidia --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
The default artifact paths for the math_with_judge resource server doesn't match the filenames for the provided dataset (nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main). This results in an error when attempting to download the files automatically from Hugging Face. The artifact paths for both training and validation need to be updated with the names as shown on Hugging Face for proper downloading. Signed-off-by: Robert Clark <roclark@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
The competitive coding resource config is missing a Hugging Face
identifier which prevents it from being downloaded via Hugging Face
using the data preparation tools.
Without the HF identifier run the following:
```
config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml"
ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface
```
This will throw a warning:
```
Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend
```
And eventually this error:
```
Traceback (most recent call last):
File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module>
sys.exit(prepare_data())
^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data
data_processor.run(global_config_dict)
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run
dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics
state = self._validate_samples_and_aggregate_metrics_single_dataset(d)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset
for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines
with open(dataset_config.jsonl_fpath) as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl'
```
This fix will download the validation file as intended and resolve the
errors.
Signed-off-by: Robert Clark <roclark@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
The train and val data paths are swapped in the config. This PR updates them. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
# PR: Add ns_tools Resources Server
## Description
Adds a new resources server that integrates NeMo Skills tools (e.g.,
stateful Python code execution) with NeMo Gym's verification system.
**Key features:**
- Executes NeMo Skills tools via the ToolManager (e.g.,
`stateful_python_code_exec`)
- Delegates verification to other resources servers (e.g.,
`math_with_judge`)
## Verifier Delegation
The `ns_tools` server acts as a pass-through for verification. When
`verify()` is called, it delegates to the configured verifier (default:
`math_with_judge`):
```
ns_tools.verify(request)
→ POST to math_with_judge/verify
→ returns reward from math_with_judge
```
This allows using NeMo Skills tools while leveraging existing
verification infrastructure.
## Example Data Format
```json
{
"id": "aime25-0",
"question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.",
"expected_answer": "70",
"verifier_type": "math_with_judge",
"agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"},
"responses_create_params": {
"input": [
{"role": "user", "content": "Solve the following math problem..."}
],
"tools": [{
"type": "function",
"name": "stateful_python_code_exec",
"description": "Execute Python code in a stateful environment.",
"parameters": {
"type": "object",
"properties": {"code": {"type": "string"}},
"required": ["code"]
}
}]
}
}
```
---------
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
## Summary - Adds new `math_formal_lean` resource server for Lean4 formal theorem proving - Implements `/verify` endpoint that compiles proofs via sandbox container and returns reward 1.0/0.0 - Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned prompt format - Comprehensive test suite (31 tests) ## Components | File | Description | |------|-------------| | `app.py` | Resource server with verify endpoint | | `sandbox_client.py` | HTTP client for Lean4 sandbox | | `proof_utils.py` | Proof extraction/building utilities | | `prepare_minif2f.py` | Dataset preparation script | | `README.md` | Documentation with licensing info | ## Test plan - [x] Unit tests pass (31/31) - [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5 samples) - [x] Tested with gpt-5.1-codex-max model - [x] Pre-commit lint checks pass 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Per title. This PR retains the current default of returning transitions, but it is reasonable to change that default to match the other Gym agents. Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Refactoring the equivalency llm judge resource server into another judge-based resource server. Main changes include removing regex logic and cleaning up related configs to that. Train data for this environment is still TBD, but a working version: Data source: Sliced terminus prompts from different sources train_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl` validation_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl` example train config: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml` Example of env validation: base model: early sft checkpoint of nano v3 (`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`) Step 50 -> 21.25% on Terminal Bench Core https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi Next steps: Will expand this PR with configurable verification options including string matching, string similarity and openapi-based output schema validation. --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Added new doc directories/article stubs for the topics identified in 0.2.0 IA. generated initial pass of structure and some starter content. This will enable contributors to focus more on the topic itself rather than the site build/toctree elements. **Feel free to blow away any initial content in these pages**. All stubbed pages have been marked with 🟡 in the toctree for easy discovery. remove 🟡 once the page is finished. <img width="1800" height="1009" alt="image" src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db" /> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Added a complete example of preparing a custom dataset for usage with NeMo Gym. The tutorial walks through downloading a dataset from Hugging Face or modifying from a different source, adding the "responses_create_params" field, writing a new resource server config, and preparing the data with "ng_prepare_data". This tutorial can be used as a guide for taking most arbitrary text-based datasets and modifying them to a format that is compatible with NeMo Gym for post-training. Signed-off-by: Robert Clark <roclark@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
enables using environments hub envs in NeMo Gym with NeMo RL for training. #446 --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
## Summary - Adds `.claude/skills/add-benchmark/` with a guided workflow for contributing new benchmarks and training environments - Covers the full lifecycle: scaffolding, data preparation, `verify()` implementation, YAML config, testing, reward profiling, and PR submission - Includes `references/patterns.md` with code templates for resource servers, agents, Ray subprocess execution, external tool auto-install, and dataset registry workflows - All content is generic (no benchmark-specific references) ## Test plan - [x] Verify skill files render correctly on GitHub - [x] Spot-check code patterns against existing resource servers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Jeff Farris <jfarris@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
## Summary - Adds `CLAUDE.md` to provide project context for Claude Code sessions - Covers architecture, CLI commands, configuration patterns, JSONL data schema, benchmark contribution workflow, code style, async patterns, external tool auto-install, and cluster gotchas - All content is generic (no benchmark-specific references) ## Test plan - [x] Verify CLAUDE.md renders correctly on GitHub - [x] Spot-check CLI commands against `pyproject.toml` entry points 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Jeff Farris <jfarris@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…llection Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…r handling, simplified judge prompt Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…racted answer for judge, add truncation and warmup support Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…er normalization, numeric fallback, max_steps limit Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…l-envs Signed-off-by: Frankie Siino <fsiino@nvidia.com>
3c21c09 to
d4efed2
Compare
…l-envs Signed-off-by: Frankie Siino <fsiino@nvidia.com> # Conflicts: # resources_servers/math_with_judge/configs/math_with_judge.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.