generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
⚡ PEFTRelated to PEFTRelated to PEFT🏋 RLOORelated to RLOORelated to RLOO🏋 SFTRelated to SFTRelated to SFT🐛 bugSomething isn't workingSomething isn't working
Description
CI fails with dev dependencies: https://github.com/huggingface/trl/actions/runs/18712033739/job/53362520844
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
Multiple tests:
FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_beta_non_zero - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_peft - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_and_importance_sampling - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_and_liger - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_beta_non_zero - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_peft - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_prompt_completion - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_text_only_data - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.Stacktrace:
_ TestGRPOTrainer.test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] _
[gw3] linux -- Python 3.12.12 /__w/trl/trl/.venv/bin/python3
path_or_repo_id = '', filenames = ['processor_config.json']
cache_dir = '/github/home/.cache/huggingface/hub', force_download = False
proxies = None, token = None, revision = None, local_files_only = False
subfolder = '', repo_type = None
user_agent = 'transformers/5.0.0.dev0; python/3.12.12; session_id/2047a935e364492090f99c27947a738f; torch/2.9.0'
_raise_exceptions_for_gated_repo = False
_raise_exceptions_for_missing_entries = False
_raise_exceptions_for_connection_errors = False, _commit_hash = None
deprecated_kwargs = {}, full_filenames = ['processor_config.json']
existing_files = [], filename = 'processor_config.json', file_counter = 0
def cached_files(
path_or_repo_id: str | os.PathLike,
filenames: list[str],
cache_dir: str | os.PathLike | None = None,
force_download: bool = False,
proxies: dict[str, str] | None = None,
token: bool | str | None = None,
revision: str | None = None,
local_files_only: bool = False,
subfolder: str = "",
repo_type: str | None = None,
user_agent: str | dict[str, str] | None = None,
_raise_exceptions_for_gated_repo: bool = True,
_raise_exceptions_for_missing_entries: bool = True,
_raise_exceptions_for_connection_errors: bool = True,
_commit_hash: str | None = None,
**deprecated_kwargs,
) -> str | None:
"""
Tries to locate several files in a local folder and repo, downloads and cache them if necessary.
Args:
path_or_repo_id (`str` or `os.PathLike`):
This can be either:
- a string, the *model id* of a model repo on huggingface.co.
- a path to a *directory* potentially containing the file.
filenames (`list[str]`):
The name of all the files to locate in `path_or_repo`.
cache_dir (`str` or `os.PathLike`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the standard
cache should not be used.
force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force to (re-)download the configuration files and override the cached versions if they
exist.
proxies (`dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
when running `hf auth login` (stored in `~/.huggingface`).
revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
identifier allowed by git.
local_files_only (`bool`, *optional*, defaults to `False`):
If `True`, will only try to load the tokenizer configuration from local files.
subfolder (`str`, *optional*, defaults to `""`):
In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can
specify the folder name here.
repo_type (`str`, *optional*):
Specify the repo type (useful when downloading from a space for instance).
Private args:
_raise_exceptions_for_gated_repo (`bool`):
if False, do not raise an exception for gated repo error but return None.
_raise_exceptions_for_missing_entries (`bool`):
if False, do not raise an exception for missing entries but return None.
_raise_exceptions_for_connection_errors (`bool`):
if False, do not raise an exception for connection errors but return None.
_commit_hash (`str`, *optional*):
passed when we are chaining several calls to various files (e.g. when loading a tokenizer or
a pipeline). If files are cached for this commit hash, avoid calls to head and get from the cache.
<Tip>
Passing `token=True` is required when you want to use a private model.
</Tip>
Returns:
`Optional[str]`: Returns the resolved file (to the cache folder if downloaded from a repo).
Examples:
```python
# Download a model weight from the Hub and cache it.
model_weights_file = cached_file("google-bert/bert-base-uncased", "pytorch_model.bin")
```
"""
if is_offline_mode() and not local_files_only:
logger.info("Offline mode: forcing local_files_only=True")
local_files_only = True
if subfolder is None:
subfolder = ""
# Add folder to filenames
full_filenames = [os.path.join(subfolder, file) for file in filenames]
path_or_repo_id = str(path_or_repo_id)
existing_files = []
for filename in full_filenames:
if os.path.isdir(path_or_repo_id):
resolved_file = os.path.join(path_or_repo_id, filename)
if not os.path.isfile(resolved_file):
if _raise_exceptions_for_missing_entries and filename != os.path.join(subfolder, "config.json"):
revision_ = "main" if revision is None else revision
raise OSError(
f"{path_or_repo_id} does not appear to have a file named {filename}. Checkout "
f"'[https://huggingface.co/{path_or_repo_id}/tree/{revision_}](https://huggingface.co/%7Bpath_or_repo_id%7D/tree/%7Brevision_%7D)' for available files."
)
else:
continue
existing_files.append(resolved_file)
if os.path.isdir(path_or_repo_id):
return existing_files if existing_files else None
if cache_dir is None:
cache_dir = TRANSFORMERS_CACHE
if isinstance(cache_dir, Path):
cache_dir = str(cache_dir)
existing_files = []
file_counter = 0
if _commit_hash is not None and not force_download:
for filename in full_filenames:
# If the file is cached under that commit hash, we return it directly.
resolved_file = try_to_load_from_cache(
path_or_repo_id, filename, cache_dir=cache_dir, revision=_commit_hash, repo_type=repo_type
)
if resolved_file is not None:
if resolved_file is not _CACHED_NO_EXIST:
file_counter += 1
existing_files.append(resolved_file)
elif not _raise_exceptions_for_missing_entries:
file_counter += 1
else:
raise OSError(f"Could not locate {filename} inside {path_or_repo_id}.")
# Either all the files were found, or some were _CACHED_NO_EXIST but we do not raise for missing entries
if file_counter == len(full_filenames):
return existing_files if len(existing_files) > 0 else None
user_agent = http_user_agent(user_agent)
# download the files if needed
try:
if len(full_filenames) == 1:
# This is slightly better for only 1 file
> hf_hub_download(
path_or_repo_id,
filenames[0],
subfolder=None if len(subfolder) == 0 else subfolder,
repo_type=repo_type,
revision=revision,
cache_dir=cache_dir,
user_agent=user_agent,
force_download=force_download,
proxies=proxies,
token=token,
local_files_only=local_files_only,
)
.venv/lib/python3.12/site-packages/transformers/utils/hub.py:469:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:85: in _inner_fn
validate_repo_id(arg_value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
repo_id = ''
def validate_repo_id(repo_id: str) -> None:
"""Validate `repo_id` is valid.
This is not meant to replace the proper validation made on the Hub but rather to
avoid local inconsistencies whenever possible (example: passing `repo_type` in the
`repo_id` is forbidden).
Rules:
- Between 1 and 96 characters.
- Either "repo_name" or "namespace/repo_name"
- [a-zA-Z0-9] or "-", "_", "."
- "--" and ".." are forbidden
Valid: `"foo"`, `"foo/bar"`, `"123"`, `"Foo-BAR_foo.bar123"`
Not valid: `"datasets/foo/bar"`, `".repo_id"`, `"foo--bar"`, `"foo.git"`
Example:
```py
>>> from huggingface_hub.utils import validate_repo_id
>>> validate_repo_id(repo_id="valid_repo_id")
>>> validate_repo_id(repo_id="other..repo..id")
huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
```
Discussed in https://github.com/huggingface/huggingface_hub/issues/1008.
In moon-landing (internal repository):
- https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts#L27
- https://github.com/huggingface/moon-landing/blob/main/server/views/components/NewRepoForm/NewRepoForm.svelte#L138
"""
if not isinstance(repo_id, str):
# Typically, a Path is not a repo_id
raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
if repo_id.count("/") > 1:
raise HFValidationError(
"Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
f" '{repo_id}'. Use `repo_type` argument if needed."
)
if not REPO_ID_REGEX.match(repo_id):
> raise HFValidationError(
"Repo id must use alphanumeric chars, '-', '_' or '.'."
" The name cannot start or end with '-' or '.' and the maximum length is 96:"
f" '{repo_id}'."
)
E huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:135: HFValidationError
During handling of the above exception, another exception occurred:
self = <tests.test_grpo_trainer.TestGRPOTrainer object at 0x7fe900d1c470>
model_id = 'trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration'
@pytest.mark.parametrize(
"model_id",
[
"trl-internal-testing/tiny-Gemma3ForConditionalGeneration",
"trl-internal-testing/tiny-LlavaNextForConditionalGeneration",
"trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration",
"trl-internal-testing/tiny-Qwen2VLForConditionalGeneration",
# "trl-internal-testing/tiny-SmolVLMForConditionalGeneration", seems not to support bf16 properly
],
)
@require_vision
def test_training_vlm(self, model_id):
dataset = load_dataset("trl-internal-testing/zen-image", "conversational_prompt_only", split="train")
def reward_func(completions, **kwargs):
"""Reward function that rewards longer completions."""
return [float(len(completion[0]["content"])) for completion in completions]
training_args = GRPOConfig(
output_dir=self.tmp_dir,
learning_rate=0.1, # increase the learning rate to speed up the test
per_device_train_batch_size=3, # reduce the batch size to reduce memory usage
num_generations=3, # reduce the number of generations to reduce memory usage
max_completion_length=8, # reduce the completion length to reduce memory usage
max_prompt_length=None, # disable prompt truncation, because usually, models don't support it
report_to="none",
)
> trainer = GRPOTrainer(
model=model_id,
reward_funcs=reward_func,
args=training_args,
train_dataset=dataset,
)
tests/test_grpo_trainer.py:1279:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
trl/trainer/grpo_trainer.py:281: in __init__
processing_class = AutoProcessor.from_pretrained(model.config._name_or_path, truncation_side="left")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py:287: in from_pretrained
processor_config_file = cached_file(pretrained_model_name_or_path, PROCESSOR_NAME, **cached_file_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/utils/hub.py:326: in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/transformers/utils/hub.py:520: in cached_files
_get_cache_file_to_return(path_or_repo_id, filename, cache_dir, revision, repo_type)
.venv/lib/python3.12/site-packages/transformers/utils/hub.py:152: in _get_cache_file_to_return
resolved_file = try_to_load_from_cache(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:85: in _inner_fn
validate_repo_id(arg_value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
repo_id = ''
def validate_repo_id(repo_id: str) -> None:
"""Validate `repo_id` is valid.
This is not meant to replace the proper validation made on the Hub but rather to
avoid local inconsistencies whenever possible (example: passing `repo_type` in the
`repo_id` is forbidden).
Rules:
- Between 1 and 96 characters.
- Either "repo_name" or "namespace/repo_name"
- [a-zA-Z0-9] or "-", "_", "."
- "--" and ".." are forbidden
Valid: `"foo"`, `"foo/bar"`, `"123"`, `"Foo-BAR_foo.bar123"`
Not valid: `"datasets/foo/bar"`, `".repo_id"`, `"foo--bar"`, `"foo.git"`
Example:
```py
>>> from huggingface_hub.utils import validate_repo_id
>>> validate_repo_id(repo_id="valid_repo_id")
>>> validate_repo_id(repo_id="other..repo..id")
huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
```
Discussed in https://github.com/huggingface/huggingface_hub/issues/1008.
In moon-landing (internal repository):
- https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts#L27
- https://github.com/huggingface/moon-landing/blob/main/server/views/components/NewRepoForm/NewRepoForm.svelte#L138
"""
if not isinstance(repo_id, str):
# Typically, a Path is not a repo_id
raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
if repo_id.count("/") > 1:
raise HFValidationError(
"Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
f" '{repo_id}'. Use `repo_type` argument if needed."
)
if not REPO_ID_REGEX.match(repo_id):
> raise HFValidationError(
"Repo id must use alphanumeric chars, '-', '_' or '.'."
" The name cannot start or end with '-' or '.' and the maximum length is 96:"
f" '{repo_id}'."
)
E huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:135: HFValidationErrorMetadata
Metadata
Assignees
Labels
⚡ PEFTRelated to PEFTRelated to PEFT🏋 RLOORelated to RLOORelated to RLOO🏋 SFTRelated to SFTRelated to SFT🐛 bugSomething isn't workingSomething isn't working