Skip to content

Conversation

Jibola
Copy link
Contributor

@Jibola Jibola commented Aug 19, 2025

Summary

Defining Field Mappings for Atlas Search and Vector Search indexes can get complicated. Our initial SearchIndex and VectorSearchIndex solutions provide reasonable defaults for categorized fields -- however for the typical MongoDB poweruser, there may be more nuanced indexes they may want to use. This PR introduces an avenue to provide more custom field mappings on a field.

Key changes

  • Added field_mappings parameter to SearchIndex to allow custom Atlas Search field configurations
  • Added DynamicSearchIndex class for dynamic field mapping scenarios.
    • This can be removed (more details below)

Test Plan

  • Manual Testing
  • Add Test cases

Screenshots

Image of a customized field_mapping added in a migration
image

It's representation on MongoDB Compass
image

Checklist

Checklist for Author

  • Does this require a changelog update?
  • Did you update the changelog (if necessary)?
  • Is this a breaking change?
  • Did you run the tests locally?

Checklist for Reviewer

  • Is the PR in the correct format?
  • Can you explain the PR?
  • Do all TODOS have JIRA tickets?
  • Have you checked for spelling & grammar errors

Additional Considerations

  • This does not have extensive testing. Tests should include:
    • A field override with multiple mappings
    • A SearchIndex constructed without anything provided in fields
    • A field name present in both fields and fields_mapping

@Jibola Jibola requested a review from Copilot August 19, 2025 23:03
@Jibola Jibola marked this pull request as draft August 19, 2025 23:03
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances MongoDB Atlas search index functionality by adding field mapping capabilities and index status monitoring. The changes allow developers to specify custom field mappings for search indexes and ensure proper synchronization during index operations.

Key changes:

  • Added field_mappings parameter to SearchIndex to allow custom Atlas Search field configurations
  • Introduced index status monitoring functions to wait for index creation/deletion completion
  • Added DynamicSearchIndex class for dynamic field mapping scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
django_mongodb_backend/schema.py Added index status monitoring functions and integrated them into index operations
django_mongodb_backend/indexes.py Enhanced SearchIndex with field_mappings support and added DynamicSearchIndex class

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

)
self.field_mappings = field_mappings or {}

fields = list({*fields, *self.field_mappings.keys()})
Copy link
Preview

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using set unpacking with {*fields, *self.field_mappings.keys()} may not preserve the original order of fields. Consider using list(dict.fromkeys(list(fields) + list(self.field_mappings.keys()))) to maintain order while removing duplicates.

Suggested change
fields = list({*fields, *self.field_mappings.keys()})
fields = list(dict.fromkeys(list(fields) + list(self.field_mappings.keys())))

Copilot uses AI. Check for mistakes.

@Jibola Jibola changed the title Allow additional fields_mappings to get added to SearchIndexModel configurations INTPYTHON-729: Improve flexibility and QOL of Atlas/Vector Search Index Configurations Aug 19, 2025
@Jibola Jibola changed the title INTPYTHON-729: Improve flexibility and QOL of Atlas/Vector Search Index Configurations INTPYTHON-729: (PoC) Improve flexibility and QOL of Atlas/Vector Search Index Configurations Aug 19, 2025
Copy link
Collaborator

@timgraham timgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined the index type API as subclasses like AutocompleteSearchIndex but I guess that's not flexible enough if an index has multiple fields with different types.

field_path = column_prefix + model._meta.get_field(field_name).column
fields[field_path] = {"type": type_}
if field_name in self.field_mappings:
fields[field_path] = self.field_mappings[field_name].copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy()?

Comment on lines 169 to 171
if field_name in self.field_mappings:
fields[field_path] = self.field_mappings[field_name].copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is field_mappings really supposed to contain the entire mapping? (e.g. "type" too). I'd think it would be more likely to be interpreted as "extra options to add to the field".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, type in the Atlas Search Field Mapping refers to the Atlas Search Field Type. We infer type from our fields, but, for instance, strings can be interpreted as four different types:

  • string (we infer)
  • token
  • stringFacet
  • autocomplete

Copy link
Collaborator

@timgraham timgraham Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Your original PR combined fields and field_mappings but I made these arguments mutually exclusive (possibly a separate class (e.g. "MappedSearchIndex") would be a better separate of concerns rather than having mutually exclusive arguments).

Comment on lines +181 to +185
class DynamicSearchIndex(SearchIndex):
suffix = "dsix"
_error_id_prefix = "django_mongodb_backend.indexes.DynamicSearchIndex"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I can override the __init__() and have it be something like this:

class DynamicSearchIndex(SearchIndex):
    def __init__(...):
        super().__init__(fields=("id"), name=name, field_mappings=field_mappings)

Overall design should be properly bike-shed either way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description below DynamicSearchIndex, you wrote, "This can be removed (more details below)". I'm not sure if this is those details. Perhaps DynamicSearchIndex should be a separate ticket.

Copy link
Contributor Author

@Jibola Jibola Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. I left it, but I ultimately want to remove it. I don't think it has a valuable place in the Django framework. We should emphasize clear type-safety at the expense of losing some looser constraints such as "Dynamic Indexing" which is documented as slower/more bloated than explicit indexing.

I can definitely see an argument for its use and this implementation was extremely straightforward, but I would rather we get customer requests first.

@timgraham timgraham changed the title INTPYTHON-729: (PoC) Improve flexibility and QOL of Atlas/Vector Search Index Configurations INTPYTHON-729 Allow creating search indexes with field mappings Sep 12, 2025
@timgraham timgraham force-pushed the INTPYTHON-729 branch 2 times, most recently from b978a65 to 48c1495 Compare September 12, 2025 20:44
@timgraham
Copy link
Collaborator

I think this is functionally complete for field_mappings, but that still doesn't support top-level definition options like "analyzer" and "searchAnalyzer" (see example) [not sure if important].

VectorSearchIndex doesn't take mappings in the same way (see syntax). The existing implementation supports some options (numDimensions, similarity) but not others (quantization, hnswOptions). If important to add, let's create a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants