-
Notifications
You must be signed in to change notification settings - Fork 18
Add SearchIndex and VectorSearchIndex #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
03629ae
to
de3d245
Compare
1bf4717
to
7dc04ab
Compare
60d49de
to
2865e13
Compare
9fdc143
to
15e3450
Compare
b06db74
to
e69da64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a check similar to https://github.com/django/django/blob/1429e722f265a4f4229b5f7eaa6a6df3161c342a/django/db/models/constraints.py#L150-L167 and make sure that schema editor ignores search indexes if not supported.
92caf14
to
61b1c05
Compare
00b0323
to
08654ec
Compare
django_mongodb_backend/schema.py
Outdated
# Drop the index if it exists, particularly if it may not have been created previously | ||
# due to lack of Atlas search support, but now the database supports it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really possible/likely that a database can go from non-Atlas to Atlas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe @Jibola could answer better, IMO it is very reasonable. An application add some support for AI and don't want to have two separate backends. The opposite isn't very reasonable, go from Atlas to non-Atlas. I decided to handle both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you migrate from non-Atlas to Atlas? Dump the db and load it in a new server? I guess it's possible for this migrations scenario to happen, but it doesn't seem very likely. I'd expect the main place where search indexes might be ignored is if they were in a third-party app, so basically the scenario would be: user installs third-party app with ignored search index, user migrates to atlas, third-party app removes atlas index. (Do you have another scenario in mind?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yes, I think you are right. But at least the if must remains as it is. Or maybe we could remove the existence check. Because if an index was skipped because the backend doesn't support atlas, then It has nothing to drop.
21c9047
to
0d6f719
Compare
name: Django Test Suite | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout django-mongodb-backend | ||
uses: actions/checkout@v4 | ||
with: | ||
persist-credentials: false | ||
- name: install django-mongodb-backend | ||
run: | | ||
pip3 install --upgrade pip | ||
pip3 install -e . | ||
- name: Checkout Django | ||
uses: actions/checkout@v4 | ||
with: | ||
repository: 'mongodb-forks/django' | ||
ref: 'mongodb-5.1.x' | ||
path: 'django_repo' | ||
persist-credentials: false | ||
- name: Install system packages for Django's Python test dependencies | ||
run: | | ||
sudo apt-get update | ||
sudo apt-get install libmemcached-dev | ||
- name: Install Django and its Python test dependencies | ||
run: | | ||
cd django_repo/tests/ | ||
pip3 install -e .. | ||
pip3 install -r requirements/py3.txt | ||
- name: Copy the test settings file | ||
run: cp .github/workflows/mongodb_settings.py django_repo/tests/ | ||
- name: Copy the test runner file | ||
run: cp .github/workflows/runtests.py django_repo/tests/runtests_.py | ||
- name: Start local Atlas | ||
working-directory: . | ||
run: bash .github/workflows/start_local_atlas.sh mongodb/mongodb-atlas-local:7 | ||
- name: Run tests | ||
run: python3 django_repo/tests/runtests_.py |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing schema tests with similarities
, both as a string and as a list.
Some fields such as :class:`~django.db.models.DecimalField` aren't | ||
supported. See the :ref:`Atlas documentation <atlas:bson-data-chart>` for a | ||
complete list of unsupported data types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate? It might be useful to have a check, but maybe we're spending too much time on this. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we are. Most of the time wasted was my fault.
Yes, it is.
VALID_FIELD_TYPES = frozenset(("boolean", "date", "number", "objectId", "string", "uuid")) | ||
_error_id_prefix = "django_mongodb_backend.indexes.VectorSearchIndex" | ||
|
||
def __init__(self, *, fields=(), similarities="cosine", name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you decide that cosine should be the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, any time that I worked with embedding, this similarity was the first that I tried. Except when the data was normalized (in that case, cosine and dot product gives the same result and dot product is faster). L2 norm is used but less than cosine in semantics searches.
In order to simplify the index, I decided to put cosine in default.
I have no preference to put similarities as a needed parameter
|
||
The index should reference at least one vector field: an :class:`.ArrayField` | ||
with a :attr:`~.ArrayField.base_field` of :class:`~django.db.models.FloatField` | ||
or :class:`~django.db.models.IntegerField`. It cannot reference an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ArrayField(IntegerField) store data in the correct format for this index? binData(int8)? https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/#about-the-similarity-functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is stored as array, isn't the best data structure to save this kind of data. But, it works. I remember having this conversation with Jib and James about that.
fd52b43
to
c59297c
Compare
Search Indexes and Vector search indexes
This PR introduces new index classes to encapsulate the definitions and details of Atlas indexes.