Skip to content

Support loading from model.safetensors.index.json #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 21, 2025

Conversation

findmyway
Copy link
Contributor

No description provided.

Copy link
Member

@pxl-th pxl-th left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would be good to add a tests if possible.

@ToucheSir
Copy link
Member

I noticed that this index file is not covered anywhere in https://huggingface.co/docs/safetensors/index or the safetensors repo. Is it a huggingface-specific thing? Is there a documentation link we can point users to for it?

@findmyway
Copy link
Contributor Author

I noticed that this index file is not covered anywhere in https://huggingface.co/docs/safetensors/index or the safetensors repo. Is it a huggingface-specific thing? Is there a documentation link we can point users to for it?

See https://huggingface.co/docs/huggingface_hub/package_reference/hf_api#huggingface_hub.utils.SafetensorsRepoMetadata

@findmyway findmyway marked this pull request as draft March 20, 2025 03:44
@findmyway findmyway marked this pull request as ready for review March 20, 2025 05:29
@findmyway
Copy link
Contributor Author

LGTM! Would be good to add a tests if possible.

Added

Copy link
Member

@chengchingwen chengchingwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index.json is defined by huggingface in their python binding for sharding the model weights and is not part of the safetensors format/spec. This should not be in this package (or at least, it should be named as something like load_shard_safetensors instead of modifying load_safetensors.

@findmyway
Copy link
Contributor Author

Hi @chengchingwen ,

I agree this feature is not part of the spec. The reason is that in real world cases, different packages may have different approaches to handle the shards (like loading them distributedly or GC during loading). However, I think the modifications I made here provide a nice-to-have fallback (by loading them all in memory).

Regarding the naming issue, reusing the load_safetensors won't create any breaking change. But I'd be happy to change if you insist.

@chengchingwen
Copy link
Member

The reason is that in real world cases, different packages may have different approaches to handle the shards

It's also part of the reason that I think it should not be in this package, but I agree it would be convenient to have and is a reasonable default for loading sharded weights. Personally, the index.json is kind of unsafe regarding the original intention of safetensors. Meanwhile, it doesn't have a stable format/spec which make things break easily and inconspicuously if any changes have been made to their python code. Adding new function also won't be a breaking change and could provide a clear separation between spec and custom behavior, so I would insist the name change if we are going to merge it.

@chengchingwen
Copy link
Member

Also, some unneeded files should be removed. e.g. test/README.md

@findmyway findmyway requested a review from chengchingwen March 21, 2025 09:22
@findmyway findmyway requested a review from chengchingwen March 21, 2025 13:39
@chengchingwen chengchingwen merged commit a720c55 into FluxML:main Mar 21, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants