Skip to content

Conversation

mixer3d
Copy link
Contributor

@mixer3d mixer3d commented Apr 18, 2025

Added case to huggingface-cli download when YAML metadata is missing

Added case to huggingface-cli download when YAML metadata is missing
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand the need for this special case, but cc @Wauplin for when he's back

@hanouticelina
Copy link
Contributor

@mixer3d Not sure I understand this special case either. If you want to download a single file, you always need to provide the filenames argument, which is the path to the file relative to the repository root.
I'm closing this PR for now, but feel free to comment if you want to provide more details or clarification.

@mixer3d
Copy link
Contributor Author

mixer3d commented Apr 21, 2025

Hi @hanouticelina so the point is that for download with CLI documentation is written that:

### Download a single file
To download a single file from a repo, simply provide the repo_id and filename as follow:
```bash
>>> huggingface-cli download gpt2 config.json

But this solution works only in certain situations, in well prepared repositories. And recently i spotted few at least, where you cannot download single file that way. So to mitigate it, you have to provide full path to file like in the example in the commit written by me for that part, and this is not part of documentation, neither there is no in help for CLI. And i found few people searching for the same solution for similar problem, so eventually when i found part of documentation for hugging-cli download command which is vague in that case, wanted to add that information. Probably it's enough to have one case with full path, to cover all situations, but i'm not sure that? And traceback from CLI was not clear enough for me:

...line 154, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 
'https://huggingface.co/Comfy-Org/HiDream-I1_ComfyUI/resolve/main/split_files/text_encoders/llama_3.1_8b_instruct_fp8_scaled.safetensors'. 
Use `repo_type` argument if needed.

So, in the end i found other users with similar question, and finally some guy from youtube pointed me in the right direction, using different syntax than in documentation. So then i tried other way and found that with full path it works.
In the end i understand now how it works, so don't need that for myself. But just wanted to contribute for others who will stuck the same way. But still, if you think that it's too much i'm fine with that. :)

@Wauplin
Copy link
Contributor

Wauplin commented Apr 22, 2025

@mixer3d I feel like the initial intent is good (i.e. documenting that it's possible to download a file from a filepath in the repo like this huggingface-cli download Comfy-Org/HiDream-I1_ComfyUI split_files/text_encoders/llama_3.1_8b_instruct_fp8_scaled.safetensors) but that the opened PR is very very specific to your use case. In practice, the Metadata Warning: empty or missing yaml metadata in repo card_ is not related at all with the download of the safetensors file. If in your workflow it's related, then I'd be interested to know what it was and how this command fixes it.

@mixer3d
Copy link
Contributor Author

mixer3d commented Apr 22, 2025

@Wauplin Hi, thx for reply, to be honest i was not sure if the issue with that repository was related to the Metadata Warning.. but at least i spotted that warning inside. And on the other side in other repositories where it was possible to download file without explicit path, metadata was present. So i thought it could be related, at least full path solved the issue in particular example. But i'm not sure how to address it correctly. The whole point in that PR was to address situation (which i don't know the cause) that in some circumstances only documented way of downloading single file won't work. And to mitigate user can try to use full path instead. Well, but i was not digging too much to confirm if the case was because of missing metadata, and traceback from cli could mean also other case.

@Wauplin
Copy link
Contributor

Wauplin commented Apr 22, 2025

In that case, I would just add a quick snippet inside the ### Download a single file section` with something like this:

### Download a single file

(...) # existing stuff

To download a file located in a subdirectory of the repo, you should provide the path of the file in the repo in posix format like this:

``bash
>>> huggingface-cli download HiDream-ai/HiDream-I1-Full text_encoder/model.safetensors
``

### Download an entire repository

Happy to review a PR with that change :)

mixer3d added a commit to mixer3d/huggingface_hub that referenced this pull request Apr 22, 2025
mixer3d added a commit to mixer3d/huggingface_hub that referenced this pull request Apr 22, 2025
@mixer3d
Copy link
Contributor Author

mixer3d commented Apr 22, 2025

@Wauplin ok done: #3023

@Wauplin
Copy link
Contributor

Wauplin commented Apr 23, 2025

Thanks! Approved and merged 🤗

clrpackages pushed a commit to clearlinux-pkgs/pypi-huggingface_hub that referenced this pull request May 14, 2025
…2 to version 0.31.1

Adrien (1):
      fix: fix test_get_hf_file_metadata_from_a_lfs_file as since xet migration (#2972)

Albert Thomas (1):
      fix default CACHE_DIR (#3050)

Brian Ronan (2):
      Removing workaround for deprecated refresh route headers (#2993)
      Xet Upload with byte array (#3035)

Celina Hanouti (3):
      Release: v0.31.0.rc0
      Release: v0.31.0
      Release: v0.31.1

Emmanuel Ferdman (1):
      Migrate to `logger.warning` usage (#3056)

Francesco Capuano (1):
      Super-micro-tiny-PR to allow for direct copy-paste :) (#3030)

HuggingFaceInfra (3):
      Update inference types (automated commit) (#2933)
      Update inference types (automated commit) (#3015)
      Update inference types (automated commit) (#3051)

Julien Chaumond (2):
      PoC: `provider="auto"` (#3011)
      [inference] Necessary breaking change: nest task-specific route inside of model route (#3044)

Lucain (4):
      Fix 'sentence-transformers/all-MiniLM-L6-v2' doesn't support task 'feature-extraction' (#2968)
      Unlist TPUs from SpaceHardware (#2973)
      Fix HfInference conversational (#2985)
      Retry on transient error in download workflow (#2976)

Lucain Pouget (1):
      better

MaCAT (1):
      Fix dynamic commit size (#3016)

Rajat Arya (3):
      Add HTTP Download support for files > 50GB (#2991)
      Docs for xet env variables (#3024)
      Minor xet changes: HF_HUB_DISABLE_XET flag, suppress logger.info (#3039)

Tom Aarsen (2):
      Fix 'sentence_similarity' on InferenceClient (#3004)
      Add the 'env' parameter to creating/updating Inference Endpoints (#3045)

Yağız Çalık (1):
      Handle Rate Limits in Pagination with Automatic Retries (#2970)

célina (16):
      A better security-wise style bot GH Action (#2914)
      fix text generation (#2982)
      prepare for next release (#2983)
      Bump `hf_xet` min version to 1.0.0 + make it required dep on 64 bits (#2971)
      Improve error handling for invalid eval results in model cards (#3000)
      fix permissions for style bot (#3012)
      update text to speech input (#3025)
      remove (inference only) VCR tests (#3021)
      remove test (#3028)
      fix snapshot download behavior in offline mode when downloading to a local dir (#3009)
      [Inference Providers] Support for LoRAs (#3005)
      [Inference Providers] sambanova supports feature extraction (#3037)
      fix docstring (#3040)
      [Inference Providers] fix inference with URL endpoints (#3041)
      support loras with replicate (#3054)
      fix conda (#3058)

mixer3d (1):
      Add example for downloading files in subdirectories, related to huggingface/huggingface_hub#3014 (#3023)

narugo1992 (1):
      dev(narugo): disable hf_transfer when custom 'Range' header is assigned (#2979)

vb (1):
      add route payload to deploy Inference Endpoints (#3013)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants