Skip to content

[core] respect local_files_only=True when using sharded checkpoints #12005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

sayakpaul
Copy link
Member

What does this PR do?

See: #11948

@sayakpaul sayakpaul requested review from DN6 and yiyixuxu July 28, 2025 14:48
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -403,9 +404,26 @@ def _get_checkpoint_shard_files(

ignore_patterns = ["*.json", "*.md"]
# `model_info` call must guarded with the above condition.
model_files_info = model_info(pretrained_model_name_or_path, revision=revision, token=token)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the purpose of this check is to verify if the necessary sharded files are present in the model repo before attempting a download, presumably to avoid a large download if all files aren't present. If we cannot connect to the hub, we just have to assume the necessary shard files are already present locally.

I think we can just skip this check if local_files_only=True and then check if all the shard filenames are present in the cached_folder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now?

Copy link
Collaborator

@DN6 DN6 Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just this is sufficient

if not local_files_only:
    # run model_info check

Run snapshot download

Then after the cached_filenames is created, iterate over the files to verify they exist

for filename in cached_filename:
      if not if not os.path.exists(filename):
           raise EnvironmentError("expected file not present in {cached_folder}")

Copy link
Member Author

@sayakpaul sayakpaul Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We don't have to run snapshot_download() when local_files_only=False, that might be unnecessary.
  2. Why run snapshot_download() after also running model_info()?
  3. Even if we run snapshot_download() regardless of local_files_only var, I think we should have it inside try-except in case the endpoint cannot be pinged for some reason and raise the ConnectionError as before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if b7af511 resolves this.

Copy link
Member Author

@sayakpaul sayakpaul Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean. Let me update. Sorry about the back and forth.

@sayakpaul sayakpaul requested a review from DN6 August 12, 2025 14:57
@sayakpaul
Copy link
Member Author

@DN6 see if the latest changes work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants