-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable loading model from hub that has already been converted #13
Conversation
@echarlaix thank you for adding support to load converted and cached ExecuTorch model from hub. We are thinking of supporting same use cases. I'd like to get your insights how we can collaborate on it together. To make the discussion easier I opened a new GitHub Issue here: #15 |
recipe: str = "", | ||
config: "PretrainedConfig" = None, | ||
subfolder: str = "", | ||
model_id: Union[str, Path], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It can still be a path to a local directory, right? Isn't the name confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I used model_id
to match what we currently have for OptimizedModel
and ORTModel
but if you think this is not clear we can think about an alternative
|
||
def _save_pretrained(self, save_directory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the _save_pretrained
is implemented already, when we export to ExecuTorch and save the pte to a local fs. I think we can just move the logic to this API, to provide HF users a consistent experience when working with ExecuTorch backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to save the resulting pte file in _save_pretrained
raise NotImplementedError |
can be done in a following PR
model_path = Path(model_path) | ||
# locates a file in a local folder and repo, downloads and cache it if necessary. | ||
if model_path.is_dir(): | ||
model_cache_path = os.path.join(model_path, subfolder, file_name) | ||
else: | ||
model_cache_path = hf_hub_download( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC it will try loading from a local directory only if the model_id is a local path, i.e. from_pretrained(model_id=my/local/path/to/executorch/model/)
. If the model_id
is a hub id meta-llama/Llama-3.2-1B
, it will download the cached one from hub. There is a case when the pte is already cached locally when calling from_pretrained(model_id=meta-llama/Llama-3.2-1B)
, shouldn't it prioritize to return the local cached one instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a case when the pte is already cached locally when calling from_pretrained(model_id=meta-llama/Llama-3.2-1B), shouldn't it prioritize to return the local cached one instead?
This will happen only for cases where the pte file is already present on the hub model repo (for example for https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch) so for
ExecuTorchModelForCausalLM.from_pretrained("optimum-internal-testing/tiny-random-llama", revision="executorch")
but this won't be the case for https://huggingface.co/meta-llama/Llama-3.2-1B/tree/main as currently not pte files are present.
Is your point that the pte file should be saved in the cache after export ? not sure this is something that we want to do as it could results in conflicts
tests/models/test_modeling.py
Outdated
def test_load_et_model_from_hub(self): | ||
model_id = "optimum-internal-testing/tiny-random-llama" | ||
|
||
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, revision="executorch", recipe="xnnpack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused what are the difference w/ and w/o revision="executorch"
? I guess the underlying question is what does the "revision" parameter do?
Oh I see what it is. https://huggingface.co/optimum-internal-testing/tiny-random-llama/tree/executorch. Here are the follow up question: When revision="main"
, the pte doesn't exist there, what would happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no pte files detected then export
will be set to True
and the model will be converted to ExecuTorch on-the-fly, after this the user can either save the resulting model locally with .save_pretrained(save_dir)
or directly push it on the hub with .push_to_hub(repo_id)
(both method needs to be implemented, let me know if you're interested by tackling this in a following PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look through the PR but it's still unclear to me where exactly the cached model is fetched from hub. Let's taking the "meta-llama/Llama-3.2-1B" for example.
Is it fetching the the cached model from https://huggingface.co/executorch-community/Llama-3.2-1B-Instruct assuming this is an entry like that?
Or it's fetching the cached model from https://huggingface.co/meta-llama/Llama-3.2-1B under "Files and versions" tab?
The logic that publishes pte to hub doesn't exist yet, which will be enabled in a separate PR, right? |
Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Guang Yang <[email protected]>
If the
can be integrated via a |
Merging and we can add more features (saving exported model locally, pushing on the hub, docs) in following PRs cc @guangy10 |
Enable loading model from the hub that have already been converted to executorch like this model
Infer whether the model should be exported or not by checking whether a
.pte
file is present when loading the model