Skip to content

Conversation

@ghukill
Copy link
Collaborator

@ghukill ghukill commented Oct 28, 2025

Purpose and background context

This PR continues a bit of refactoring, going into the stubbing of actual embeddings creation CLI commands, around how the model is found and loaded for CLI commands.

Updates:

  • rename TE_MODEL_DOWNLOAD_PATH to TE_MODEL_PATH to reflect it's used for both download and loading
  • always pass model_path to an embedding class instance and save to self, then use for methods like download() and load()
  • continue to build out the CLI decorator @model_required to have it handle the injection of CLI arguments and the loading of the model, all before you hit the business logic of the CLI command

How can a reviewer manually see the effects of these changes?

No functional changes, just refactoring.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why these changes are being introduced:

Many of the  CLI commands will require an embedding class and model to work.
A decorator was created originally that injected a --model-uri CLI argument,
but it also provides a place to load the class itself and become more of a
middleware.

How this addresses that need:
Updates the model_required decorator to also load the embedding model class.
This DRY's up the CLI commands that use it and centralizes that logic and
conventions for the CLI argument, env vars, and whatnot.

Lastly, it is now required to include a 'model_path' when instantiating
a model class instance, and this location is used for both download and
load.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-112
Comment on lines +15 to +21
def __init__(self, model_path: str | Path) -> None:
"""Initialize the embedding model with a model path.
Args:
model_path: Path where the model will be downloaded to and loaded from.
"""
self.model_path = Path(model_path)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes can be traced back to this change. When we instantiate an embedding class, we pass a model_path always, which can be used for any downloading or loading of the model.

This removes the need to pass around paths, and we can assume it's always required.

Comment on lines +50 to +63
def model_required(f: Callable) -> Callable:
"""Middleware decorator for commands that require an embedding model.
This decorator adds two CLI options:
- "--model-uri": defaults to environment variable "TE_MODEL_URI"
- "--model-path": defaults to environment variable "TE_MODEL_PATH"
The decorator intercepts these parameters, uses the model URI to identify and
instantiate the appropriate embedding model class with the provided model path,
and stores the model instance in the Click context at ctx.obj["model"].
Both model_uri and model_path parameters are consumed by the decorator and not
passed to the decorated command function.
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in the docstring, this decorator grew a bit in responsibility.

When applied to a CLI command, the arguments are injected, and now we get a nearly fully initialized embedding class instance back, with the model_path set for use in download() or load() if the CLI commands do either of those.

This decorator was also moved in the file, hence the git diff showing all new.

@ghukill ghukill requested a review from a team October 28, 2025 20:25
@ghukill ghukill marked this pull request as ready for review October 28, 2025 20:25
Comment on lines +86 to +88
This CLI command is NOT used during normal workflows. This is used primary
during development and after model downloading/loading changes to ensure the
model loads correctly.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docstring for this CLI command @ehanson8 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change!

Comment on lines +86 to +88
This CLI command is NOT used during normal workflows. This is used primary
during development and after model downloading/loading changes to ensure the
model loads correctly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@abstractmethod
def download(self, output_path: str | Path) -> Path:
"""Download and prepare model, saving to output_path.
def download(self) -> Path:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for the rename to model_path, more descriptive and just fits better throughout the code

@ghukill ghukill merged commit b3dbbb8 into main Oct 30, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants