Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable loading model from hub that has already been converted #13

Merged
merged 24 commits into from
Feb 12, 2025

Conversation

echarlaix
Copy link
Collaborator

@echarlaix echarlaix commented Feb 3, 2025

  • Enable loading model from the hub that have already been converted to executorch like this model

  • Infer whether the model should be exported or not by checking whether a .pte file is present when loading the model

from optimum.executorch import ExecuTorchModelForCausalLM

model_id = "optimum-internal-testing/tiny-random-llama"
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, revision="executorch")

@echarlaix echarlaix marked this pull request as ready for review February 3, 2025 18:29
@echarlaix echarlaix changed the title Load from hub Enable loading model from hub that has already been converted Feb 3, 2025
@guangy10
Copy link
Collaborator

guangy10 commented Feb 3, 2025

@echarlaix thank you for adding support to load converted and cached ExecuTorch model from hub. We are thinking of supporting same use cases. I'd like to get your insights how we can collaborate on it together. To make the discussion easier I opened a new GitHub Issue here: #15

recipe: str = "",
config: "PretrainedConfig" = None,
subfolder: str = "",
model_id: Union[str, Path],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It can still be a path to a local directory, right? Isn't the name confusing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def _save_pretrained(self, save_directory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the _save_pretrained is implemented already, when we export to ExecuTorch and save the pte to a local fs. I think we can just move the logic to this API, to provide HF users a consistent experience when working with ExecuTorch backend

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to save the resulting pte file in _save_pretrained

raise NotImplementedError

can be done in a following PR

Comment on lines +186 to +191
model_path = Path(model_path)
# locates a file in a local folder and repo, downloads and cache it if necessary.
if model_path.is_dir():
model_cache_path = os.path.join(model_path, subfolder, file_name)
else:
model_cache_path = hf_hub_download(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it will try loading from a local directory only if the model_id is a local path, i.e. from_pretrained(model_id=my/local/path/to/executorch/model/). If the model_id is a hub id meta-llama/Llama-3.2-1B, it will download the cached one from hub. There is a case when the pte is already cached locally when calling from_pretrained(model_id=meta-llama/Llama-3.2-1B), shouldn't it prioritize to return the local cached one instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a case when the pte is already cached locally when calling from_pretrained(model_id=meta-llama/Llama-3.2-1B), shouldn't it prioritize to return the local cached one instead?

This will happen only for cases where the pte file is already present on the hub model repo (for example for https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch) so for

ExecuTorchModelForCausalLM.from_pretrained("optimum-internal-testing/tiny-random-llama", revision="executorch")

but this won't be the case for https://huggingface.co/meta-llama/Llama-3.2-1B/tree/main as currently not pte files are present.

Is your point that the pte file should be saved in the cache after export ? not sure this is something that we want to do as it could results in conflicts

Comment on lines 42 to 45
def test_load_et_model_from_hub(self):
model_id = "optimum-internal-testing/tiny-random-llama"

model = ExecuTorchModelForCausalLM.from_pretrained(model_id, revision="executorch", recipe="xnnpack")
Copy link
Collaborator

@guangy10 guangy10 Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what are the difference w/ and w/o revision="executorch"? I guess the underlying question is what does the "revision" parameter do?

Oh I see what it is. https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch. Here are the follow up question: When revision="main", the pte doesn't exist there, what would happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no pte files detected then export will be set to True and the model will be converted to ExecuTorch on-the-fly, after this the user can either save the resulting model locally with .save_pretrained(save_dir) or directly push it on the hub with .push_to_hub(repo_id) (both method needs to be implemented, let me know if you're interested by tackling this in a following PR)

Copy link
Collaborator

@guangy10 guangy10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look through the PR but it's still unclear to me where exactly the cached model is fetched from hub. Let's taking the "meta-llama/Llama-3.2-1B" for example.

Is it fetching the the cached model from https://huggingface.co/executorch-community/Llama-3.2-1B-Instruct assuming this is an entry like that?

Or it's fetching the cached model from https://huggingface.co/meta-llama/Llama-3.2-1B under "Files and versions" tab?

@guangy10
Copy link
Collaborator

guangy10 commented Feb 8, 2025

The logic that publishes pte to hub doesn't exist yet, which will be enabled in a separate PR, right?

@echarlaix
Copy link
Collaborator Author

Look through the PR but it's still unclear to me where exactly the cached model is fetched from hub. Let's taking the "meta-llama/Llama-3.2-1B" for example.

Is it fetching the the cached model from https://huggingface.co/executorch-community/Llama-3.2-1B-Instruct assuming this is an entry like that?

Or it's fetching the cached model from https://huggingface.co/meta-llama/Llama-3.2-1B under "Files and versions" tab?

If the model_id is set to meta-llama/Llama-3.2-1B then https://huggingface.co/meta-llama/Llama-3.2-1B

The logic that publishes pte to hub doesn't exist yet, which will be enabled in a separate PR, right?

can be integrated via a push_to_hub method in #13 (comment)

@echarlaix
Copy link
Collaborator Author

Merging and we can add more features (saving exported model locally, pushing on the hub, docs) in following PRs cc @guangy10

@echarlaix echarlaix merged commit 453123f into main Feb 12, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants