Skip to content

Conversation

@ghukill
Copy link
Collaborator

@ghukill ghukill commented Oct 22, 2025

Purpose and background context

This PR advances USE-112 (CLI stubs) and USE-113 (Dockerfile building) by introducing some structure to represent models in this CLI application.

To build a Docker image that includes the model itself, we will need some opinionated code that understands how to download and configure the model.

The proposed approach is to represent each model as an instance of a base BaseEmbeddingModel class. This base class will require methods like:

  • download(): download the model from HuggingFace and zip it up for future use
  • load(): load the model from the zipped up, local copy (not yet stubbed)
  • create_embeddings(): entrypoint for actually creating embeddings (not yet stubbed)

At this time, only download() is stubbed.

We also have our first "real" model OSNeuralSparseDocV3GTE, though only minimal stubs.

Lastly, the CLI command download-model is added which will be used by the Dockerfile during build as we move into that work.

All in all, this PR represents a base model architecture and stubbed methods. No real functionality at this time.

How can a reviewer manually see the effects of these changes?

TODO...

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why these changes are being introduced:

For this application to create embeddings, it will need some
structure to handle the downloading, loading, and use of
particular models.

Additionally, this application should provide the opinionated
functionality to download and zip up a model, either for use
within a Docker image, or local testing.

How this addresses that need:
* Class 'BaseEmbeddingModel' is created.  This base class will be
extended by actual models we intend to use.  This base class will
expect any child class to define methods for downloading, loading,
and using the model.

* Stubs a CLI command 'download-model' which we will use very
soon for building a Docker image that contains the model weights.

Side effects of this change:
* None at this time.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-112
* https://mitlibraries.atlassian.net/browse/USE-113
@ghukill ghukill requested a review from a team October 22, 2025 18:27
@ghukill ghukill marked this pull request as ready for review October 22, 2025 18:27
@jonavellecuerdo jonavellecuerdo self-requested a review October 22, 2025 18:55
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some minor comments but nothing that should block approval

Comment on lines 76 to 77
logger.info("Model downloaded successfully to: %s", result_path)
click.echo(f"Model saved to: {result_path}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the double logging?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, not a blocker

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! Nice catch, thanks. That was a little bit of copy pasta.

I'm going to update to use f-strings, to be consistent with our approach. But, I have been trying to be more cognizant of logging vs stdout in CLIs lately.

The logging is really just visual, but click.echo() emits that data to stdout. We're not doing anything with it yet, but I think it could be good practice to occassionally log + stdout.

But yeah, that was copy/paste cruft, will update.

Comment on lines 83 to 84
msg = f"Download failed: {e}"
raise click.ClickException(msg) from e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this could just be raise click.ClickException(f"Download failed: {e}") from e?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, some copy/paste from spike code. Removed entirely in new commit.

@ghukill
Copy link
Collaborator Author

ghukill commented Oct 22, 2025

Thanks for the review @ehanson8 - I see you've approved already, and may be away for a couple of days, but just in case you're interested here are some updates based on your comments: 8cac1a0.

Copy link

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Curious to see what methods/approaches you take for shared CLI options as they are implemented.

@ghukill ghukill merged commit 6e1a1a5 into main Oct 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants