[Enhancement] Add similarity threshold filter #111

dvejsada · 2025-01-19T00:07:31Z

As LibreChat performs file search by looping over all documents available to the respective endpoint, in case of many documents (e.g. larger agent knowledge base) it returns large results even if most of it is not at all relevant. This causes a lot of input tokens for LLM and high API usage price in case of using more advanced models.

This PR introduces an option to set similarity threshold. All results over the similarity threshold will be filtered and not provided back to LibreChat.

If the similarity threshold is not set, a default value of 1 will be used which means nothing will be filtered out. Therefore, there is no breaking change to existing deployments.

dvejsada · 2025-01-19T00:09:06Z

Closes #109

dvejsada · 2025-01-19T00:10:48Z

Also mitigates this

thoj · 2025-01-23T23:00:58Z

I think maybe we also need a MAX_RESULT config to limit the number of results returned. My use case is probably a little wired but I want to search several hundred files. Currently file search returns up to 4 results per file this fills my context window instantly. The best way is probably in my case is probably to first filter by relevance like #111 then only return the most MAX_RESULT relevant results.

dvejsada · 2025-01-24T05:04:53Z

I think maybe we also need a MAX_RESULT config to limit the number of results returned. My use case is probably a little wired but I want to search several hundred files. Currently file search returns up to 4 results per file this fills my context window instantly. The best way is probably in my case is probably to first filter by relevance like #111 then only return the most MAX_RESULT relevant results.

I believe this would have to be implemented ať LibreChat(client) level. To my understanding, LibreChat sends queries to rag api per each attached file, so there is no way to limit total output for all files on rag api level.

dvejsada added 2 commits January 19, 2025 01:00

Insert new environment variable for similarity threshold.

2981b63

Add filter of results based on similarity threshold.

cd0a731

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add similarity threshold filter #111

[Enhancement] Add similarity threshold filter #111

dvejsada commented Jan 19, 2025

dvejsada commented Jan 19, 2025

dvejsada commented Jan 19, 2025

thoj commented Jan 23, 2025

dvejsada commented Jan 24, 2025

[Enhancement] Add similarity threshold filter #111

Are you sure you want to change the base?

[Enhancement] Add similarity threshold filter #111

Conversation

dvejsada commented Jan 19, 2025

dvejsada commented Jan 19, 2025

dvejsada commented Jan 19, 2025

thoj commented Jan 23, 2025

dvejsada commented Jan 24, 2025