USE 112 - refactor model load #15

ghukill · 2025-10-28T20:22:06Z

Purpose and background context

This PR continues a bit of refactoring, going into the stubbing of actual embeddings creation CLI commands, around how the model is found and loaded for CLI commands.

Updates:

rename TE_MODEL_DOWNLOAD_PATH to TE_MODEL_PATH to reflect it's used for both download and loading
always pass model_path to an embedding class instance and save to self, then use for methods like download() and load()
continue to build out the CLI decorator @model_required to have it handle the injection of CLI arguments and the loading of the model, all before you hit the business logic of the CLI command

How can a reviewer manually see the effects of these changes?

No functional changes, just refactoring.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/USE-112

Code review

Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why these changes are being introduced: Many of the CLI commands will require an embedding class and model to work. A decorator was created originally that injected a --model-uri CLI argument, but it also provides a place to load the class itself and become more of a middleware. How this addresses that need: Updates the model_required decorator to also load the embedding model class. This DRY's up the CLI commands that use it and centralizes that logic and conventions for the CLI argument, env vars, and whatnot. Lastly, it is now required to include a 'model_path' when instantiating a model class instance, and this location is used for both download and load. Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-112

ghukill · 2025-10-28T20:23:34Z

embeddings/models/base.py

+    def __init__(self, model_path: str | Path) -> None:
+        """Initialize the embedding model with a model path.
+
+        Args:
+            model_path: Path where the model will be downloaded to and loaded from.
+        """
+        self.model_path = Path(model_path)


Most changes can be traced back to this change. When we instantiate an embedding class, we pass a model_path always, which can be used for any downloading or loading of the model.

This removes the need to pass around paths, and we can assume it's always required.

ghukill · 2025-10-28T20:25:21Z

embeddings/cli.py

+def model_required(f: Callable) -> Callable:
+    """Middleware decorator for commands that require an embedding model.
+
+    This decorator adds two CLI options:
+    - "--model-uri": defaults to environment variable "TE_MODEL_URI"
+    - "--model-path": defaults to environment variable "TE_MODEL_PATH"
+
+    The decorator intercepts these parameters, uses the model URI to identify and
+    instantiate the appropriate embedding model class with the provided model path,
+    and stores the model instance in the Click context at ctx.obj["model"].
+
+    Both model_uri and model_path parameters are consumed by the decorator and not
+    passed to the decorated command function.
+    """


As noted in the docstring, this decorator grew a bit in responsibility.

When applied to a CLI command, the arguments are injected, and now we get a nearly fully initialized embedding class instance back, with the model_path set for use in download() or load() if the CLI commands do either of those.

This decorator was also moved in the file, hence the git diff showing all new.

ghukill · 2025-10-28T20:27:59Z

README.md

+  This CLI command is NOT used during normal workflows.  This is used primary
+  during development and after model downloading/loading changes to ensure the
+  model loads correctly.


Updated docstring for this CLI command @ehanson8 😅

ehanson8

Good change!

ehanson8 · 2025-10-30T14:27:36Z

README.md

+  This CLI command is NOT used during normal workflows.  This is used primary
+  during development and after model downloading/loading changes to ensure the
+  model loads correctly.


ehanson8 · 2025-10-30T14:33:12Z

embeddings/models/base.py

    @abstractmethod
-    def download(self, output_path: str | Path) -> Path:
-        """Download and prepare model, saving to output_path.
+    def download(self) -> Path:


💯 for the rename to model_path, more descriptive and just fits better throughout the code

ghukill added 2 commits October 28, 2025 16:17

Remove copy/paste Dockerfile comments

02264cb

ghukill commented Oct 28, 2025

View reviewed changes

ghukill requested a review from a team October 28, 2025 20:25

ghukill marked this pull request as ready for review October 28, 2025 20:25

ghukill commented Oct 28, 2025

View reviewed changes

ehanson8 approved these changes Oct 30, 2025

View reviewed changes

ghukill merged commit b3dbbb8 into main Oct 30, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

USE 112 - refactor model load #15

USE 112 - refactor model load #15

Uh oh!

ghukill commented Oct 28, 2025

Uh oh!

ghukill Oct 28, 2025

Uh oh!

ghukill Oct 28, 2025

Uh oh!

ghukill Oct 28, 2025

Uh oh!

ehanson8 Oct 30, 2025

Uh oh!

ehanson8 left a comment

Uh oh!

ehanson8 Oct 30, 2025

Uh oh!

ehanson8 Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

USE 112 - refactor model load #15

USE 112 - refactor model load #15

Uh oh!

Conversation

ghukill commented Oct 28, 2025

Purpose and background context

How can a reviewer manually see the effects of these changes?

Includes new or updated dependencies?

Changes expectations for external applications?

What are the relevant tickets?

Code review

Uh oh!

ghukill Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

ghukill Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

ghukill Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

ehanson8 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

ehanson8 left a comment

Choose a reason for hiding this comment

Uh oh!

ehanson8 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

ehanson8 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants