New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Enable loading model from hub that has already been converted #13

Merged

echarlaix merged 24 commits into main from load-from-hub

Feb 12, 2025

Collaborator

echarlaix commented Feb 3, 2025 •

edited

Loading

Enable loading model from the hub that have already been converted to executorch like this model
Infer whether the model should be exported or not by checking whether a .pte file is present when loading the model

from optimum.executorch import ExecuTorchModelForCausalLM

model_id = "optimum-internal-testing/tiny-random-llama"
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, revision="executorch")

echarlaix added 5 commits

February 3, 2025 17:34


          use_auth_token not needed

def6fd5


          merge main in branch

31dc269


          remove from_pretrained method

d600659


          Enable loading model from the HF hub

e1d0eb4


          add test

567766b

echarlaix marked this pull request as ready for review

February 3, 2025 18:29

echarlaix changed the title ~~Load from hub~~ Enable loading model from hub that has already been converted


          add task

8cb152a

Collaborator

guangy10 commented Feb 3, 2025

@echarlaix thank you for adding support to load converted and cached ExecuTorch model from hub. We are thinking of supporting same use cases. I'd like to get your insights how we can collaborate on it together. To make the discussion easier I opened a new GitHub Issue here: #15

guangy10 mentioned this pull request

[RFC] Support load converted model from hub #15

Open

echarlaix added 7 commits

February 4, 2025 17:31


          add from_pretrained method

216d71a


          infer if needs export

aa46b61


          update setup

f33d6f5


          add test

6f4685b


          remove subfolder

0d62737


          fix model file pattern

477c08d


          remove export from tests

echarlaix commented

View reviewed changes

optimum/executorch/modeling.py Show resolved Hide resolved

echarlaix added 2 commits

February 6, 2025 10:06


          trigger test

30442f4

fix

f351a2a

echarlaix commented

View reviewed changes

setup.py Outdated Show resolved Hide resolved

echarlaix requested review from michaelbenayoun and guangy10

February 6, 2025 14:09

echarlaix added 2 commits

February 7, 2025 22:38


          fix for offline mode

61d421f

fix

aec1827

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py

-                      recipe: str = "",
-                      config: "PretrainedConfig" = None,
-                      subfolder: str = "",
+                      model_id: Union[str, Path],

Collaborator

guangy10 Feb 7, 2025

nit: It can still be a path to a local directory, right? Isn't the name confusing?

Collaborator Author

echarlaix Feb 11, 2025

Yes! I used model_id to match what we currently have for OptimizedModel

https://github.com/huggingface/optimum/blob/856b25267ca686f05706cd44f0ebb669618cf302/optimum/modeling_base.py#L336

and ORTModel

https://github.com/huggingface/optimum/blob/856b25267ca686f05706cd44f0ebb669618cf302/optimum/onnxruntime/modeling_ort.py#L142

but if you think this is not clear we can think about an alternative

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py Outdated Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py Outdated Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py Outdated Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py


		def _save_pretrained(self, save_directory):

Collaborator

guangy10 Feb 8, 2025

I think the _save_pretrained is implemented already, when we export to ExecuTorch and save the pte to a local fs. I think we can just move the logic to this API, to provide HF users a consistent experience when working with ExecuTorch backend

Collaborator Author

echarlaix Feb 11, 2025

we need to save the resulting pte file in _save_pretrained

optimum-executorch/optimum/executorch/modeling.py

Line 253 in 38c9782

raise NotImplementedError

can be done in a following PR

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py

Comment on lines +186 to +191

+                      model_path = Path(model_path)
+                      # locates a file in a local folder and repo, downloads and cache it if necessary.
+                      if model_path.is_dir():
+                          model_cache_path = os.path.join(model_path, subfolder, file_name)
+                      else:
+                          model_cache_path = hf_hub_download(

Collaborator

guangy10 Feb 8, 2025

IIUC it will try loading from a local directory only if the model_id is a local path, i.e. from_pretrained(model_id=my/local/path/to/executorch/model/). If the model_id is a hub id meta-llama/Llama-3.2-1B, it will download the cached one from hub. There is a case when the pte is already cached locally when calling from_pretrained(model_id=meta-llama/Llama-3.2-1B), shouldn't it prioritize to return the local cached one instead?

Collaborator Author

echarlaix Feb 11, 2025

There is a case when the pte is already cached locally when calling from_pretrained(model_id=meta-llama/Llama-3.2-1B), shouldn't it prioritize to return the local cached one instead?

This will happen only for cases where the pte file is already present on the hub model repo (for example for https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch) so for

ExecuTorchModelForCausalLM.from_pretrained("optimum-internal-testing/tiny-random-llama", revision="executorch")

but this won't be the case for https://huggingface.co/meta-llama/Llama-3.2-1B/tree/main as currently not pte files are present.

Is your point that the pte file should be saved in the cache after export ? not sure this is something that we want to do as it could results in conflicts

guangy10 reviewed

View reviewed changes

optimum/executorch/modeling.py Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

tests/models/test_modeling.py Outdated Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

tests/models/test_modeling.py Outdated

Comment on lines 42 to 45

+                  def test_load_et_model_from_hub(self):
+                      model_id = "optimum-internal-testing/tiny-random-llama"
+                      model = ExecuTorchModelForCausalLM.from_pretrained(model_id, revision="executorch", recipe="xnnpack")

Collaborator

guangy10 Feb 8, 2025 •

edited

Loading

~~I'm confused what are the difference w/ and w/o revision="executorch"? I guess the underlying question is what does the "revision" parameter do?~~

Oh I see what it is. https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch. Here are the follow up question: When revision="main", the pte doesn't exist there, what would happen?

Collaborator Author

echarlaix Feb 11, 2025

if no pte files detected then export will be set to True and the model will be converted to ExecuTorch on-the-fly, after this the user can either save the resulting model locally with .save_pretrained(save_dir) or directly push it on the hub with .push_to_hub(repo_id) (both method needs to be implemented, let me know if you're interested by tackling this in a following PR)

guangy10 reviewed

View reviewed changes

tests/models/test_modeling.py Outdated Show resolved Hide resolved

guangy10 reviewed

View reviewed changes

Collaborator

guangy10 left a comment

Look through the PR but it's still unclear to me where exactly the cached model is fetched from hub. Let's taking the "meta-llama/Llama-3.2-1B" for example.

Is it fetching the the cached model from https://huggingface.co/executorch-community/Llama-3.2-1B-Instruct assuming this is an entry like that?

Or it's fetching the cached model from https://huggingface.co/meta-llama/Llama-3.2-1B under "Files and versions" tab?

Collaborator

guangy10 commented Feb 8, 2025

The logic that publishes pte to hub doesn't exist yet, which will be enabled in a separate PR, right?

echarlaix and others added 6 commits

February 11, 2025 17:24


          infer if pte model in subfolder

6c54460


          fix style

5240b6a


          Update tests/models/test_modeling.py

adfe30c

Co-authored-by: Guang Yang <[email protected]>


          Update tests/models/test_modeling.py

5735f75

Co-authored-by: Guang Yang <[email protected]>

fix

cc3796b


          style

cd107b2

Collaborator Author

echarlaix commented Feb 11, 2025

Look through the PR but it's still unclear to me where exactly the cached model is fetched from hub. Let's taking the "meta-llama/Llama-3.2-1B" for example.

Is it fetching the the cached model from https://huggingface.co/executorch-community/Llama-3.2-1B-Instruct assuming this is an entry like that?

Or it's fetching the cached model from https://huggingface.co/meta-llama/Llama-3.2-1B under "Files and versions" tab?

If the model_id is set to meta-llama/Llama-3.2-1B then https://huggingface.co/meta-llama/Llama-3.2-1B

The logic that publishes pte to hub doesn't exist yet, which will be enabled in a separate PR, right?

can be integrated via a push_to_hub method in #13 (comment)


          add test

7f360f4

Collaborator Author

echarlaix commented Feb 12, 2025

Merging and we can add more features (saving exported model locally, pushing on the hub, docs) in following PRs cc @guangy10

echarlaix merged commit 453123f into main

22 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet