Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: QueryNode memory keeps increasing on search and search_Iterator with DiskANN. #39561

Open
1 task done
akmalmasud96 opened this issue Jan 23, 2025 · 6 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@akmalmasud96
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:v2.5.1
- Deployment mode(standalone or cluster): Aws EKS Cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.2

Current Behavior

We have been using search_iterator and the query_nodes' memory keeps increasing, with searching and ultimately the nodes gets crash due to memory overflow. Similar issue of non-release of memory was observed on search, however, this behavior is observed steadily on search, but on the search_iterator it is observed to be increasing exponential.There are 6 query_nodes with 60 GBs of memory, each. The vectors we are searching are 16-bits, 512 dimensional. And the total data is around 653 Million.
At the start, prior to searching, the memory consumed was at 50%, however, it keep on increasing as we performed searching. Attached are the Grafana graphs for query_nodes memory consumption.

Following is the code-block that we are using for the search_iterator.

def worker_function(vector_embedding):
    iterator = collection.search_iterator(
                                        data=[vector_embedding],
                                        anns_field="embeddings",
                                        param=search_params,
                                        batch_size=16384,
                                        expr=None,
                                       output_fields=["id", "vector_id"]
                                    )
    res_dict = []
    stop_iteration = False
    while True:
        res = iterator.next()
        if not res.ids():
            iterator.close()
            break
        else:
            for hit in tqdm.tqdm( res ):
                if hit.distance >= threshold:
                    res_dict.append( {
                       'distance': hit.distance, 
                       'vector_id': hit.entity.get('vector_id'), 
                        'id':hit.entity.get('id'), 
                    }
                    )
                else:
                    stop_iteration = True
                    iterator.close()
                    break
            if stop_iteration == True:
                break
    return  res_dict

Expected Behavior

No response

Steps To Reproduce

Milvus Log

No response

Anything else?

attach is the query nodes behaviour while using search_iterator.

Image
@akmalmasud96 akmalmasud96 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 23, 2025
@PwzXxm
Copy link
Contributor

PwzXxm commented Jan 24, 2025

Thanks for opening the issue. I will investigate into this. May I ask how many iterators are there concurrently?

Are you trying to obtain results with a specific distance threshold? Maybe you can try out range search https://milvus.io/docs/range-search.md#Range-Search

@yanliang567
Copy link
Contributor

/assign @PwzXxm
/unassign

@sre-ci-robot sre-ci-robot assigned PwzXxm and unassigned yanliang567 Jan 24, 2025
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2025
@yanliang567 yanliang567 modified the milestones: 2.5.4, 2.5.5 Jan 24, 2025
@akmalmasud96
Copy link
Author

@PwzXxm This behavior was observed while running one concurrent iterator.

@akmalmasud96
Copy link
Author

@PwzXxm it seems that after using the range_search and radius parameters, the query node's memory_consumption remains constant for search_interator. any specific reason for that?
I will also share the behavior about the simple search.

Image

@yanliang567
Copy link
Contributor

/assign @qixuan0212
please keep an eye on this issue

@PwzXxm
Copy link
Contributor

PwzXxm commented Jan 26, 2025

@akmalmasud96 Could you help confirming the behavior, please?

  1. Use search_iterator without radius parameter, batch_size=16384, iterate until a threshold was met -> Memory increases from 32GB to 64GB and causing OOM.
  2. Use search_iterator with radius parameter (set it to the threshold as above, I guess?), other parameters remain unchanged -> memory remains constant.
  3. Use search, not search_iterator, memory remains constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants