-
Notifications
You must be signed in to change notification settings - Fork 0
USE 112 & 113 - Base model class and stubs for downloads #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: For this application to create embeddings, it will need some structure to handle the downloading, loading, and use of particular models. Additionally, this application should provide the opinionated functionality to download and zip up a model, either for use within a Docker image, or local testing. How this addresses that need: * Class 'BaseEmbeddingModel' is created. This base class will be extended by actual models we intend to use. This base class will expect any child class to define methods for downloading, loading, and using the model. * Stubs a CLI command 'download-model' which we will use very soon for building a Docker image that contains the model weights. Side effects of this change: * None at this time. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-112 * https://mitlibraries.atlassian.net/browse/USE-113
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Some minor comments but nothing that should block approval
embeddings/cli.py
Outdated
| logger.info("Model downloaded successfully to: %s", result_path) | ||
| click.echo(f"Model saved to: {result_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the double logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, not a blocker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack! Nice catch, thanks. That was a little bit of copy pasta.
I'm going to update to use f-strings, to be consistent with our approach. But, I have been trying to be more cognizant of logging vs stdout in CLIs lately.
The logging is really just visual, but click.echo() emits that data to stdout. We're not doing anything with it yet, but I think it could be good practice to occassionally log + stdout.
But yeah, that was copy/paste cruft, will update.
embeddings/cli.py
Outdated
| msg = f"Download failed: {e}" | ||
| raise click.ClickException(msg) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this could just be raise click.ClickException(f"Download failed: {e}") from e?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, some copy/paste from spike code. Removed entirely in new commit.
jonavellecuerdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Curious to see what methods/approaches you take for shared CLI options as they are implemented.
Purpose and background context
This PR advances USE-112 (CLI stubs) and USE-113 (
Dockerfilebuilding) by introducing some structure to represent models in this CLI application.To build a Docker image that includes the model itself, we will need some opinionated code that understands how to download and configure the model.
The proposed approach is to represent each model as an instance of a base
BaseEmbeddingModelclass. This base class will require methods like:download(): download the model from HuggingFace and zip it up for future useload(): load the model from the zipped up, local copy (not yet stubbed)create_embeddings(): entrypoint for actually creating embeddings (not yet stubbed)At this time, only
download()is stubbed.We also have our first "real" model
OSNeuralSparseDocV3GTE, though only minimal stubs.Lastly, the CLI command
download-modelis added which will be used by theDockerfileduring build as we move into that work.All in all, this PR represents a base model architecture and stubbed methods. No real functionality at this time.
How can a reviewer manually see the effects of these changes?
TODO...
Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?
Code review