Skip to content

CI fails with dev dependencies: HFValidationError: Repo id must use alphanumeric chars #4323

@albertvillanova

Description

@albertvillanova

CI fails with dev dependencies: https://github.com/huggingface/trl/actions/runs/18712033739/job/53362520844

huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.

Multiple tests:

  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_beta_non_zero - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_peft - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_and_importance_sampling - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_and_liger - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_beta_non_zero - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_peft - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_prompt_completion - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_text_only_data - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.

Stacktrace:

_ TestGRPOTrainer.test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] _
  [gw3] linux -- Python 3.12.12 /__w/trl/trl/.venv/bin/python3
  
  path_or_repo_id = '', filenames = ['processor_config.json']
  cache_dir = '/github/home/.cache/huggingface/hub', force_download = False
  proxies = None, token = None, revision = None, local_files_only = False
  subfolder = '', repo_type = None
  user_agent = 'transformers/5.0.0.dev0; python/3.12.12; session_id/2047a935e364492090f99c27947a738f; torch/2.9.0'
  _raise_exceptions_for_gated_repo = False
  _raise_exceptions_for_missing_entries = False
  _raise_exceptions_for_connection_errors = False, _commit_hash = None
  deprecated_kwargs = {}, full_filenames = ['processor_config.json']
  existing_files = [], filename = 'processor_config.json', file_counter = 0
  
      def cached_files(
          path_or_repo_id: str | os.PathLike,
          filenames: list[str],
          cache_dir: str | os.PathLike | None = None,
          force_download: bool = False,
          proxies: dict[str, str] | None = None,
          token: bool | str | None = None,
          revision: str | None = None,
          local_files_only: bool = False,
          subfolder: str = "",
          repo_type: str | None = None,
          user_agent: str | dict[str, str] | None = None,
          _raise_exceptions_for_gated_repo: bool = True,
          _raise_exceptions_for_missing_entries: bool = True,
          _raise_exceptions_for_connection_errors: bool = True,
          _commit_hash: str | None = None,
          **deprecated_kwargs,
      ) -> str | None:
          """
          Tries to locate several files in a local folder and repo, downloads and cache them if necessary.
      
          Args:
              path_or_repo_id (`str` or `os.PathLike`):
                  This can be either:
                  - a string, the *model id* of a model repo on huggingface.co.
                  - a path to a *directory* potentially containing the file.
              filenames (`list[str]`):
                  The name of all the files to locate in `path_or_repo`.
              cache_dir (`str` or `os.PathLike`, *optional*):
                  Path to a directory in which a downloaded pretrained model configuration should be cached if the standard
                  cache should not be used.
              force_download (`bool`, *optional*, defaults to `False`):
                  Whether or not to force to (re-)download the configuration files and override the cached versions if they
                  exist.
              proxies (`dict[str, str]`, *optional*):
                  A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
                  'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
              token (`str` or *bool*, *optional*):
                  The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
                  when running `hf auth login` (stored in `~/.huggingface`).
              revision (`str`, *optional*, defaults to `"main"`):
                  The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
                  git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
                  identifier allowed by git.
              local_files_only (`bool`, *optional*, defaults to `False`):
                  If `True`, will only try to load the tokenizer configuration from local files.
              subfolder (`str`, *optional*, defaults to `""`):
                  In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can
                  specify the folder name here.
              repo_type (`str`, *optional*):
                  Specify the repo type (useful when downloading from a space for instance).
      
          Private args:
              _raise_exceptions_for_gated_repo (`bool`):
                  if False, do not raise an exception for gated repo error but return None.
              _raise_exceptions_for_missing_entries (`bool`):
                  if False, do not raise an exception for missing entries but return None.
              _raise_exceptions_for_connection_errors (`bool`):
                  if False, do not raise an exception for connection errors but return None.
              _commit_hash (`str`, *optional*):
                  passed when we are chaining several calls to various files (e.g. when loading a tokenizer or
                  a pipeline). If files are cached for this commit hash, avoid calls to head and get from the cache.
      
          <Tip>
      
          Passing `token=True` is required when you want to use a private model.
      
          </Tip>
      
          Returns:
              `Optional[str]`: Returns the resolved file (to the cache folder if downloaded from a repo).
      
          Examples:
      
          ```python
          # Download a model weight from the Hub and cache it.
          model_weights_file = cached_file("google-bert/bert-base-uncased", "pytorch_model.bin")
          ```
          """
          if is_offline_mode() and not local_files_only:
              logger.info("Offline mode: forcing local_files_only=True")
              local_files_only = True
          if subfolder is None:
              subfolder = ""
      
          # Add folder to filenames
          full_filenames = [os.path.join(subfolder, file) for file in filenames]
      
          path_or_repo_id = str(path_or_repo_id)
          existing_files = []
          for filename in full_filenames:
              if os.path.isdir(path_or_repo_id):
                  resolved_file = os.path.join(path_or_repo_id, filename)
                  if not os.path.isfile(resolved_file):
                      if _raise_exceptions_for_missing_entries and filename != os.path.join(subfolder, "config.json"):
                          revision_ = "main" if revision is None else revision
                          raise OSError(
                              f"{path_or_repo_id} does not appear to have a file named {filename}. Checkout "
                              f"'[https://huggingface.co/{path_or_repo_id}/tree/{revision_}](https://huggingface.co/%7Bpath_or_repo_id%7D/tree/%7Brevision_%7D)' for available files."
                          )
                      else:
                          continue
                  existing_files.append(resolved_file)
      
          if os.path.isdir(path_or_repo_id):
              return existing_files if existing_files else None
      
          if cache_dir is None:
              cache_dir = TRANSFORMERS_CACHE
          if isinstance(cache_dir, Path):
              cache_dir = str(cache_dir)
      
          existing_files = []
          file_counter = 0
          if _commit_hash is not None and not force_download:
              for filename in full_filenames:
                  # If the file is cached under that commit hash, we return it directly.
                  resolved_file = try_to_load_from_cache(
                      path_or_repo_id, filename, cache_dir=cache_dir, revision=_commit_hash, repo_type=repo_type
                  )
                  if resolved_file is not None:
                      if resolved_file is not _CACHED_NO_EXIST:
                          file_counter += 1
                          existing_files.append(resolved_file)
                      elif not _raise_exceptions_for_missing_entries:
                          file_counter += 1
                      else:
                          raise OSError(f"Could not locate {filename} inside {path_or_repo_id}.")
      
          # Either all the files were found, or some were _CACHED_NO_EXIST but we do not raise for missing entries
          if file_counter == len(full_filenames):
              return existing_files if len(existing_files) > 0 else None
      
          user_agent = http_user_agent(user_agent)
          # download the files if needed
          try:
              if len(full_filenames) == 1:
                  # This is slightly better for only 1 file
  >               hf_hub_download(
                      path_or_repo_id,
                      filenames[0],
                      subfolder=None if len(subfolder) == 0 else subfolder,
                      repo_type=repo_type,
                      revision=revision,
                      cache_dir=cache_dir,
                      user_agent=user_agent,
                      force_download=force_download,
                      proxies=proxies,
                      token=token,
                      local_files_only=local_files_only,
                  )
  
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:469: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:85: in _inner_fn
      validate_repo_id(arg_value)
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  
  repo_id = ''
  
      def validate_repo_id(repo_id: str) -> None:
          """Validate `repo_id` is valid.
      
          This is not meant to replace the proper validation made on the Hub but rather to
          avoid local inconsistencies whenever possible (example: passing `repo_type` in the
          `repo_id` is forbidden).
      
          Rules:
          - Between 1 and 96 characters.
          - Either "repo_name" or "namespace/repo_name"
          - [a-zA-Z0-9] or "-", "_", "."
          - "--" and ".." are forbidden
      
          Valid: `"foo"`, `"foo/bar"`, `"123"`, `"Foo-BAR_foo.bar123"`
      
          Not valid: `"datasets/foo/bar"`, `".repo_id"`, `"foo--bar"`, `"foo.git"`
      
          Example:
          ```py
          >>> from huggingface_hub.utils import validate_repo_id
          >>> validate_repo_id(repo_id="valid_repo_id")
          >>> validate_repo_id(repo_id="other..repo..id")
          huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
          ```
      
          Discussed in https://github.com/huggingface/huggingface_hub/issues/1008.
          In moon-landing (internal repository):
          - https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts#L27
          - https://github.com/huggingface/moon-landing/blob/main/server/views/components/NewRepoForm/NewRepoForm.svelte#L138
          """
          if not isinstance(repo_id, str):
              # Typically, a Path is not a repo_id
              raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
      
          if repo_id.count("/") > 1:
              raise HFValidationError(
                  "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
                  f" '{repo_id}'. Use `repo_type` argument if needed."
              )
      
          if not REPO_ID_REGEX.match(repo_id):
  >           raise HFValidationError(
                  "Repo id must use alphanumeric chars, '-', '_' or '.'."
                  " The name cannot start or end with '-' or '.' and the maximum length is 96:"
                  f" '{repo_id}'."
              )
  E           huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:135: HFValidationError
  
  During handling of the above exception, another exception occurred:
  
  self = <tests.test_grpo_trainer.TestGRPOTrainer object at 0x7fe900d1c470>
  model_id = 'trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration'
  
      @pytest.mark.parametrize(
          "model_id",
          [
              "trl-internal-testing/tiny-Gemma3ForConditionalGeneration",
              "trl-internal-testing/tiny-LlavaNextForConditionalGeneration",
              "trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration",
              "trl-internal-testing/tiny-Qwen2VLForConditionalGeneration",
              # "trl-internal-testing/tiny-SmolVLMForConditionalGeneration", seems not to support bf16 properly
          ],
      )
      @require_vision
      def test_training_vlm(self, model_id):
          dataset = load_dataset("trl-internal-testing/zen-image", "conversational_prompt_only", split="train")
      
          def reward_func(completions, **kwargs):
              """Reward function that rewards longer completions."""
              return [float(len(completion[0]["content"])) for completion in completions]
      
          training_args = GRPOConfig(
              output_dir=self.tmp_dir,
              learning_rate=0.1,  # increase the learning rate to speed up the test
              per_device_train_batch_size=3,  # reduce the batch size to reduce memory usage
              num_generations=3,  # reduce the number of generations to reduce memory usage
              max_completion_length=8,  # reduce the completion length to reduce memory usage
              max_prompt_length=None,  # disable prompt truncation, because usually, models don't support it
              report_to="none",
          )
  >       trainer = GRPOTrainer(
              model=model_id,
              reward_funcs=reward_func,
              args=training_args,
              train_dataset=dataset,
          )
  
  tests/test_grpo_trainer.py:1279: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  trl/trainer/grpo_trainer.py:281: in __init__
      processing_class = AutoProcessor.from_pretrained(model.config._name_or_path, truncation_side="left")
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  .venv/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py:287: in from_pretrained
      processor_config_file = cached_file(pretrained_model_name_or_path, PROCESSOR_NAME, **cached_file_kwargs)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:326: in cached_file
      file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:520: in cached_files
      _get_cache_file_to_return(path_or_repo_id, filename, cache_dir, revision, repo_type)
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:152: in _get_cache_file_to_return
      resolved_file = try_to_load_from_cache(
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:85: in _inner_fn
      validate_repo_id(arg_value)
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  
  repo_id = ''
  
      def validate_repo_id(repo_id: str) -> None:
          """Validate `repo_id` is valid.
      
          This is not meant to replace the proper validation made on the Hub but rather to
          avoid local inconsistencies whenever possible (example: passing `repo_type` in the
          `repo_id` is forbidden).
      
          Rules:
          - Between 1 and 96 characters.
          - Either "repo_name" or "namespace/repo_name"
          - [a-zA-Z0-9] or "-", "_", "."
          - "--" and ".." are forbidden
      
          Valid: `"foo"`, `"foo/bar"`, `"123"`, `"Foo-BAR_foo.bar123"`
      
          Not valid: `"datasets/foo/bar"`, `".repo_id"`, `"foo--bar"`, `"foo.git"`
      
          Example:
          ```py
          >>> from huggingface_hub.utils import validate_repo_id
          >>> validate_repo_id(repo_id="valid_repo_id")
          >>> validate_repo_id(repo_id="other..repo..id")
          huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
          ```
      
          Discussed in https://github.com/huggingface/huggingface_hub/issues/1008.
          In moon-landing (internal repository):
          - https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts#L27
          - https://github.com/huggingface/moon-landing/blob/main/server/views/components/NewRepoForm/NewRepoForm.svelte#L138
          """
          if not isinstance(repo_id, str):
              # Typically, a Path is not a repo_id
              raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
      
          if repo_id.count("/") > 1:
              raise HFValidationError(
                  "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
                  f" '{repo_id}'. Use `repo_type` argument if needed."
              )
      
          if not REPO_ID_REGEX.match(repo_id):
  >           raise HFValidationError(
                  "Repo id must use alphanumeric chars, '-', '_' or '.'."
                  " The name cannot start or end with '-' or '.' and the maximum length is 96:"
                  f" '{repo_id}'."
              )
  E           huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:135: HFValidationError

Metadata

Metadata

Labels

⚡ PEFTRelated to PEFT🏋 RLOORelated to RLOO🏋 SFTRelated to SFT🐛 bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions