Mark nmslib references for vector search as deprecated (#9107)

kotwanikunal · kolchfa-aws · natebower · web-flow · commit a94277fb6cc5 · 2025-02-04T13:39:35.000-05:00
* Mark nmslib references for vector search as deprecated

Signed-off-by: Kunal Kotwani &lt;kkotwani@amazon.com&gt;

* Apply suggestions from code review

Signed-off-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;

* Apply suggestions from code review

Co-authored-by: Nathan Bower &lt;nbower@amazon.com&gt;
Signed-off-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;

---------

Signed-off-by: Kunal Kotwani &lt;kkotwani@amazon.com&gt;
Signed-off-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;
Co-authored-by: kolchfa-aws &lt;105444904+kolchfa-aws@users.noreply.github.com&gt;
Co-authored-by: Nathan Bower &lt;nbower@amazon.com&gt;
diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md
@@ -48,9 +48,9 @@ Vector search involves trade-offs between low-latency and low-cost search. Speci
 
 The following modes are currently supported.
 
-| Mode    | Default engine | Description  |
+| Mode    | Default engine | Description                                                                                                                                                                                                                                             |
 |:---|:---|:---|
-| `in_memory` (Default) | `nmslib`       | Prioritizes low-latency search. This mode uses the `nmslib` engine without any quantization applied. It is configured with the default parameter values for vector search in OpenSearch.                                                            |
+| `in_memory` (Default) | `faiss`        | Prioritizes low-latency search. This mode uses the `faiss` engine without any quantization applied. It is configured with the default parameter values for vector search in OpenSearch.                                                                 |
 | `on_disk`             | `faiss`        | Prioritizes low-cost vector search while maintaining strong recall. By default, the `on_disk` mode uses quantization and rescoring to execute a two-pass approach to retrieve the top neighbors. The `on_disk` mode supports only `float` vector types. |
 
 To create a k-NN index that uses the `on_disk` mode for low-cost search, send the following request:
@@ -81,14 +81,14 @@ PUT test-index
 
 The `compression_level` mapping parameter selects a quantization encoder that reduces vector memory consumption by the given factor. The following table lists the available `compression_level` values.
 
-| Compression level | Supported engines              |
-|:------------------|:-------------------------------|
-| `1x`              | `faiss`, `lucene`, and `nmslib` |
-| `2x`              | `faiss`                        |
-| `4x`              | `lucene`                       |
-| `8x`              | `faiss`                        |
-| `16x`             | `faiss`                        |
-| `32x`             | `faiss`                        |
+| Compression level | Supported engines                            |
+|:------------------|:---------------------------------------------|
+| `1x`              | `faiss`, `lucene`, and `nmslib` (deprecated) |
+| `2x`              | `faiss`                                      |
+| `4x`              | `lucene`                                     |
+| `8x`              | `faiss`                                      |
+| `16x`             | `faiss`                                      |
+| `32x`             | `faiss`                                      |
 
 For example, if a `compression_level` of `32x` is passed for a `float32` index of 768-dimensional vectors, the per-vector memory is reduced from `4 * 768 = 3072` bytes to `3072 / 32 = 846` bytes. Internally, binary quantization (which maps a `float` to a `bit`) may be used to achieve this compression.
 
diff --git a/_ml-commons-plugin/remote-models/async-batch-ingestion.md b/_ml-commons-plugin/remote-models/async-batch-ingestion.md
@@ -49,7 +49,7 @@ PUT /my-nlp-index
         "type": "knn_vector",
         "dimension": 384,
         "method": {
-          "engine": "nmslib",
+          "engine": "faiss",
           "space_type": "cosinesimil",
           "name": "hnsw",
           "parameters": {
@@ -65,7 +65,7 @@ PUT /my-nlp-index
         "type": "knn_vector",
         "dimension": 384,
         "method": {
-          "engine": "nmslib",
+          "engine": "faiss",
           "space_type": "cosinesimil",
           "name": "hnsw",
           "parameters": {
diff --git a/_search-plugins/knn/api.md b/_search-plugins/knn/api.md
@@ -173,11 +173,11 @@ For the warmup operation to function properly, follow these best practices:
 Introduced 2.14
 {: .label .label-purple }
 
-During approximate k-NN search or warmup operations, the native library indexes (`nmslib` and `faiss` engines) are loaded into native memory. Currently, you can evict an index from cache or native memory by either deleting the index or setting the k-NN cluster settings `knn.cache.item.expiry.enabled` and `knn.cache.item.expiry.minutes`, which removes the index from the cache if it is idle for a given period of time. However, you cannot evict an index from the cache without deleting the index. To solve this problem, you can use the k-NN clear cache API operation, which clears a given set of indexes from the cache.
+During approximate k-NN search or warmup operations, the native library indexes (for the `faiss` and `nmslib` [deprecated] engines) are loaded into native memory. Currently, you can evict an index from the cache or native memory by either deleting the index or setting the k-NN cluster settings `knn.cache.item.expiry.enabled` and `knn.cache.item.expiry.minutes`, which removes the index from the cache if it is idle for a given period of time. However, you cannot evict an index from the cache without deleting the index. To solve this problem, you can use the k-NN clear cache API operation, which clears a given set of indexes from the cache.
 
 The k-NN clear cache API evicts all native library files for all shards (primaries and replicas) of all indexes specified in the request. Similarly to how the [warmup operation](#warmup-operation) behaves, the k-NN clear cache API is idempotent, meaning that if you try to clear the cache for an index that has already been evicted from the cache, it does not have any additional effect.
 
-This API operation only works with indexes created using the `nmslib` and `faiss` engines. It has no effect on indexes created using the `lucene` engine.
+This API operation only works with indexes created using the `faiss` and `nmslib` (deprecated) engines. It has no effect on indexes created using the `lucene` engine.
 {: .note}
 
 #### Usage
@@ -236,7 +236,7 @@ Response field |  Description
 `error` | An error message explaining why the model is in a failed state.
 `space_type` | The space type for which this model is trained, for example, Euclidean or cosine. Note - this value can be set in the top-level of the request as well
 `dimension` | The dimensionality of the vector space for which this model is designed.
-`engine` | The native library used to create the model, either `faiss` or `nmslib`. 
+`engine` | The native library used to create the model, either `faiss` or `nmslib` (deprecated). 
 
 ### Usage
 
diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md
@@ -24,7 +24,7 @@ Because the native library indexes are constructed during indexing, it is not po
 
 Each of the three engines used for approximate k-NN search has its own attributes that make one more sensible to use than the others in a given situation. You can follow the general information below to help determine which engine will best meet your requirements.
 
-In general, nmslib outperforms both faiss and Lucene on search. However, to optimize for indexing throughput, faiss is a good option. For relatively smaller datasets (up to a few million vectors), the Lucene engine demonstrates better latencies and recall. At the same time, the size of the index is smallest compared to the other engines, which allows it to use smaller AWS instances for data nodes.
+In general, NMSLIB (deprecated) outperforms both Faiss and Lucene when used for search operations. However, to optimize for indexing throughput, Faiss is a good option. For relatively smaller datasets (up to a few million vectors), the Lucene engine demonstrates better latencies and recall. At the same time, the size of the index is smallest compared to the other engines, which allows it to use smaller AWS instances for data nodes.
 
 When considering cluster node sizing, a general approach is to first establish an even distribution of the index across the cluster. However, there are other considerations. To help make these choices, you can refer to the OpenSearch managed service guidance in the section [Sizing domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html).
 
@@ -33,7 +33,7 @@ When considering cluster node sizing, a general approach is to first establish a
 To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with `index.knn` set to `true`. This setting tells the plugin to create native library indexes for the index.
 
 Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two
-`knn_vector` fields, one using `faiss` and the other using `nmslib` fields:
+`knn_vector` fields, one using `faiss` and the other using `nmslib` (deprecated) fields:
 
 ```json
 PUT my-knn-index-1
@@ -52,7 +52,7 @@ PUT my-knn-index-1
           "space_type": "l2",
           "method": {
             "name": "hnsw",
-            "engine": "nmslib",
+            "engine": "faiss",
             "parameters": {
               "ef_construction": 128,
               "m": 24
@@ -294,7 +294,7 @@ The following table provides information about the `ef_search` parameter for the
 
 Engine | Radial query support | Notes
 :--- | :--- | :---
-`nmslib` | No | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting.
+`nmslib` (Deprecated) | No | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting.
 `faiss` | Yes | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting.
 `lucene` | No | When creating a search query, you must specify `k`. If you provide both `k` and `ef_search`, then the larger value is passed to the engine. If `ef_search` is larger than `k`, you can provide the `size` parameter to limit the final number of results to `k`. 
 
diff --git a/_search-plugins/knn/filter-search-knn.md b/_search-plugins/knn/filter-search-knn.md
@@ -26,8 +26,8 @@ The following table summarizes the preceding filtering use cases.
 Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause
 :--- | :--- | :--- | :---
 Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause.
-Boolean filter | After search (post-filtering) | Approximate | - `lucene`<br> - `nmslib`<br> - `faiss` | Outside the k-NN query clause. Must be a leaf clause.
-The `post_filter` parameter | After search (post-filtering) | Approximate | - `lucene`<br> - `nmslib`<br> - `faiss` | Outside the k-NN query clause. 
+Boolean filter | After search (post-filtering) | Approximate | - `lucene` <br> - `faiss` <br> - `nmslib` (deprecated)  | Outside the k-NN query clause. Must be a leaf clause.
+The `post_filter` parameter | After search (post-filtering) | Approximate | - `lucene`<br> - `nmslib` (deprecated) <br> - `faiss` | Outside the k-NN query clause. 
 Scoring script filter | Before search (pre-filtering) | Exact | N/A | Inside the script score query clause.
 
 ## Filtered search optimization
diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md
@@ -94,10 +94,10 @@ Mapping parameter | Required | Default | Updatable | Description
 :--- | :--- | :--- | :--- | :---
 `name` | true | n/a | false | The identifier for the nearest neighbor method.
 `space_type` | false | l2 | false | The vector space used to calculate the distance between vectors. Note: This value can also be specified at the top level of the mapping.
-`engine` | false | faiss  | false | The approximate k-NN library to use for indexing and search. The available libraries are `faiss`, `nmslib`, and `lucene`.
+`engine` | false | faiss  | false | The approximate k-NN library to use for indexing and search. The available libraries are `faiss`, `lucene`, and `nmslib` (deprecated).
 `parameters` | false | null | false | The parameters used for the nearest neighbor method.
 
-### Supported nmslib methods
+### Supported NMSLIB methods
 
 Method name | Requires training | Supported spaces | Description
 :--- | :--- | :--- | :---
@@ -110,7 +110,7 @@ Parameter name | Required | Default | Updatable | Description
 `ef_construction` | false | 100 | false | The size of the dynamic list used during k-NN graph creation. Higher values result in a more accurate graph but slower indexing speed.
 `m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
 
-For nmslib, *ef_search* is set in the [index settings](#index-settings).
+For nmslib (deprecated), *ef_search* is set in the [index settings](#index-settings).
 {: .note}
 
 An index created in OpenSearch version 2.11 or earlier will still use the old `ef_construction` value (`512`).
@@ -372,7 +372,7 @@ At the moment, several parameters defined in the settings are in the deprecation
 Setting | Default | Updatable | Description
 :--- | :--- | :--- | :---
 `index.knn` | false | false | Whether the index should build native library indexes for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but approximate k-NN search functionality will be disabled.
-`index.knn.algo_param.ef_search` | 100 | true | The size of the dynamic list used during k-NN searches. Higher values result in more accurate but slower searches. Only available for NMSLIB.
+`index.knn.algo_param.ef_search` (Deprecated) | 100 | true | The size of the dynamic list used during k-NN searches. Higher values result in more accurate but slower searches. Only available for NMSLIB.
 `index.knn.advanced.approximate_threshold` | 15,000   | true      | The number of vectors a segment must have before creating specialized data structures for approximate search. Set to `-1` to disable building vector data structures and `0` to always build them.
 `index.knn.algo_param.ef_construction` | 100 | false | Deprecated in 1.0.0. Instead, use the [mapping parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions) to set this value.
 `index.knn.algo_param.m` | 16 | false | Deprecated in 1.0.0. Use the [mapping parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions) to set this value instead.
diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md
@@ -9,7 +9,7 @@ has_math: true
 
 # k-NN vector quantization
 
-By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization.
+By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for the native `faiss` and `nmslib` [deprecated] engines). To reduce the memory footprint, you can use vector quantization.
 
 OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, product quantization (PQ), and binary quantization(BQ).
 
diff --git a/_search-plugins/vector-search.md b/_search-plugins/vector-search.md
@@ -40,7 +40,7 @@ PUT test-index
         "space_type": "l2",
         "method": {
           "name": "hnsw",
-          "engine": "nmslib",
+          "engine": "faiss",
           "parameters": {
             "ef_construction": 128,
             "m": 24
@@ -88,12 +88,12 @@ The following table lists the combinations of search methods and libraries suppo
 
 Method | Engine
 :--- | :---
-HNSW | NMSLIB, Faiss, Lucene
+HNSW | Faiss, Lucene, NMSLIB (deprecated)
 IVF | Faiss 
 
 ### Engine recommendations
 
-In general, select NMSLIB or Faiss for large-scale use cases. Lucene is a good option for smaller deployments and offers benefits like smart filtering, where the optimal filtering strategy—pre-filtering, post-filtering, or exact k-NN—is automatically applied depending on the situation. The following table summarizes the differences between each option.
+In general, select Faiss for large-scale use cases. Lucene is a good option for smaller deployments and offers benefits like smart filtering, where the optimal filtering strategy—pre-filtering, post-filtering, or exact k-NN—is automatically applied depending on the situation. The following table summarizes the differences between each option.
 
 | |  NMSLIB/HNSW |  Faiss/HNSW |  Faiss/IVF |  Lucene/HNSW |
 |:---|:---|:---|:---|:---|