Refactor/optimize embedding module#207
Conversation
… isolated clients
Codecov Report❌ Patch coverage is
|
| return cached | ||
|
|
||
| embedding = self._generate_embedding(text) | ||
| self._add_to_cache(text, embedding) |
There was a problem hiding this comment.
somehow that lead to a bug that's why I changed it into if self._cache is not None.
There was a problem hiding this comment.
This block still triggers though if param cache=False
If the cache is not being utilized, I don't see any point in filling it
| self._cache = None | ||
| self._cache_lock = None | ||
|
|
||
| @abstractmethod |
There was a problem hiding this comment.
The adapters only differ in the way they call and interact with their API. Previously the common things like converting the output into lists were also covered by the child adapters.
I implemented get_embedding function in the base file and covered all the common expects of adapters. This new method calls a protected abstractmethod called _generate_embedding which implemented in each child adapter according to their API. Thus, the adapters only need to implement a very small function.
| pass | ||
|
|
||
| def add_to_cache(self, text: str, embedding: Sequence[float]): | ||
| @abstractmethod |
There was a problem hiding this comment.
having an abstract "private" method here is an anti-pattern - abstract methods are meant to be visible and to be implemented
There was a problem hiding this comment.
This is a protected method though, as it only has one prefix underscore.
There was a problem hiding this comment.
Still, abstract methods should be public in general by design, meant to signal that this needs implementation
| Acts as a unified interface for local Hugging Face models, OpenAI API models, and locally hosted Ollama models. | ||
| """ | ||
|
|
||
| _MODEL_REGISTRY = { |
There was a problem hiding this comment.
I would not restrict this to these models. I think a better way would be to check if the model exists (hf api) and throw an exception on initialization if it does not
There was a problem hiding this comment.
That make sense as well, I just wanted to type them so the user will have autocomplete on their IDE.
There was a problem hiding this comment.
Also we cannot check from HF API because not all models there are using sentence-transformers and current adapter sometimes have issues with other models.
| """Reset global caches and initialize a fresh DummyAdapter before each test.""" | ||
| _GLOBAL_CACHES.clear() | ||
| _GLOBAL_LOCKS.clear() | ||
| self.adapter = DummyAdapter(model_name="dummy-model", cache=True) |
There was a problem hiding this comment.
Does not cover cases where cache=False if set up here like this
| """Reset the class-level model cache and initialize a mocked adapter.""" | ||
| HuggingFaceAdapter._model_cache.clear() | ||
| self.mock_model_instance = mock_transformer.return_value | ||
| self.adapter = HuggingFaceAdapter(cache=False) |
| def setUp(self, mock_client): | ||
| """Initialize the adapter with a mocked Ollama client.""" | ||
| self.mock_client_instance = mock_client.return_value | ||
| self.adapter = OllamaAdapter(cache=False) |
| def setUp(self, mock_openai): | ||
| """Initialize the adapter with a mocked OpenAI client.""" | ||
| self.mock_client = mock_openai.return_value | ||
| self.adapter = GPT4Adapter(api_key="test-key", cache=False) |
No description provided.