.Net: WIP on LINQ-based criteria filtering #10273

roji · 2025-01-23T13:44:08Z

This is some draft work on LINQ-based criteria filtering (#10156). Basic equality, AND and OR are implemented for Qdrant, and tested via a specification test suite that should be usable more or less as-is across all other providers (#10194). There's a lot more work here and many possible improvements - am submitting in early draft stage to gather early feedback.

To see what this looks like, see these tests.

Note that this adds new projects outside of the existing "IntegrationTests" project; it's difficult to work when all provider tests are in the same project (any experimental breaking change in the abstraction must be propagated everywhere), and in any case, I think we want to end up in a place where each provider has its own separate test project.

/cc @westey-m

roji · 2025-01-23T13:49:00Z

dotnet/src/Connectors/Connectors.Memory.Qdrant/QdrantFilterTranslator.cs

+        var right = this.Visit(andAlso.Right);
+
+        // Qdrant doesn't allow arbitrary nesting of logical operators, only one MUST list (AND), one SHOULD list (OR), and one MUST_NOT list (AND NOT).
+        // We can combine MUST and MUST_NOT; but we can only combine SHOULD if it's the *only* thing on the one side (no MUST/MUST_NOT), and there's no SHOULD on the other (only MUST/MUST_NOT).


Am not 100% sure about Qdrant's capabilities here, should confirm.

roji · 2025-01-23T13:51:14Z

dotnet/src/Connectors/VectorData.Abstractions/VectorSearch/IVectorizableTextSearch.cs

@@ -20,6 +20,6 @@ public interface IVectorizableTextSearch<TRecord>
    /// <returns>The records found by the vector search, including their result scores.</returns>
    Task<VectorSearchResults<TRecord>> VectorizableTextSearchAsync(
        string searchText,
-        VectorSearchOptions? options = default,
+        VectorSearchOptions<TRecord>? options = default,


Making VectorSearchOptions generic over TRecord isn't trivial... Modern C# users can use target-typed new, and in that case this doesn't interfere in the method calling. But people not doing target-typed new would have to explicitly specify the full type every time (even if they don't use filtering at all), which maybe isn't ideal. The alternative would be to move the LINQ filter to an optional parameter directly on the method (no strong feelings from my side here).

dotnet/SK-dotnet.sln

dotnet/src/Connectors/VectorData.Abstractions/VectorSearch/VectorSearchOptions.cs

dmytrostruk · 2025-01-23T15:28:30Z

dotnet/src/VectorDataTests/VectorDataSpecificationTests/Filter/BasicFilterTestsBase.cs

+    where TKey : notnull
+{
+    [Fact]
+    public virtual async Task Equality_with_int()


If we can keep PascalCase as naming convention for test methods like in other tests in a codebase - that would be great :)

I'm mainly anticipating all these moving out to the Microsoft.Extensions repo, where IIRC the convention is like this. But I can rename to match SK conventions and change later if you prefer.

dotnet/src/VectorDataTests/QdrantTests/Support/TestContainer/QdrantBuilder.cs

dotnet/src/VectorDataTests/QdrantTests/Support/TestContainer/QdrantConfiguration.cs

dotnet/src/VectorDataTests/QdrantTests/Support/TestContainer/QdrantContainer.cs

westey-m · 2025-01-23T16:44:09Z

@roji, a good scenario to also include in early exploration would be tag based operations, e.g. where there is an array of strings (tags) in a field in the database, and we want to check if one or more out of tags we provide match those.
We have received consistent feedback that this is a popular scenario for vector search filtering.

westey-m · 2025-01-23T16:52:02Z

dotnet/src/VectorDataTests/VectorDataSpecificationTests/Filter/BasicFilterTestsBase.cs

+            e.Int2 == a.Int2);
+    }
+
+    [Fact]


One of the issues I have been facing with abstract base class tests is that a few of the DBs don't perform well on the Build server for integration tests, so we typically run them manually. Mostly issues with failed startup and flakiness.
I generalized some work done for Pinecone that allows us to have attribute based test disabling, which allows us to have an attribute on the inheriting class that disables all tests in the inheriting and base class.
Check out VectorStoreFact, PineconeApiKeySetConditionAttribute and DisableVectorStoreTestsAttribute for how this works.
Passing a Skip reason to Fact doesn't work because it requires a static reason, and it would just disable the tests for all inheritors.

Yeah, I remember this... I'll take a look.

But just off the top of my head, in CI we could run tests only for those providers where there's a good container-baesd option (like Qdrant), and not run them for the others (splitting the test projects by provider could also help with that a bit). Locally, developers can simply explicitly choose what it is they want to run - though that would mean that running all tests in the solution would produce failures. If that's not acceptable, a conditional [Fact] attribute like what you mention above could be good, based on some environment variable or whatever.

In any case, my hope is that this could be orthogonal to the tests themselves being general across providers, as opposed to written per-provider... But we'll see.

I quite like the idea of the test conditionality being based on whether the required secrets have been configured like in PineconeApiKeySetConditionAttribute, since it removes the need to change build settings in addition to setting the build secrets. Locally, I prefer skipped to failure, since failures need to be investigated, while the meaning of something being skipped is clear without further investigation.

Makes sense, will do that.

roji · 2025-01-23T22:30:51Z

a good scenario to also include in early exploration would be tag based operations, e.g. where there is an array of strings (tags) in a field in the database, and we want to check if one or more out of tags we provide match those.
We have received consistent feedback that this is a popular scenario for vector search filtering.

@westey-m thanks - will do that, it should be easily representible in LINQ by doing b => b.Tags.Contains("foo"). Maybe let's talk tomorrow about how that's represented e.g. in Qdrant (just because I'm currently focusing on it).

* Also inline captured variables. * Also improve tests to check for none/all results.

WIP on LINQ-based criteria filtering

4f4fd93

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel memory labels Jan 23, 2025

github-actions bot changed the title ~~WIP on LINQ-based criteria filtering~~ .Net: WIP on LINQ-based criteria filtering Jan 23, 2025

roji had a problem deploying to integration January 23, 2025 13:44 — with GitHub Actions Failure

roji commented Jan 23, 2025

View reviewed changes

dmytrostruk reviewed Jan 23, 2025

View reviewed changes

Improve tests

7b259e8

roji had a problem deploying to integration January 23, 2025 15:40 — with GitHub Actions Failure

Support non-equal, flipped equal/non-equal

fe01f40

roji had a problem deploying to integration January 23, 2025 16:25 — with GitHub Actions Failure

westey-m reviewed Jan 23, 2025

View reviewed changes

roji changed the base branch from main to feature-vector-data-preb1 January 24, 2025 18:18

Address some review comments

46991f9

roji had a problem deploying to integration January 24, 2025 18:29 — with GitHub Actions Failure

Implement Contains over array fields

11c6b29

roji had a problem deploying to integration January 24, 2025 19:32 — with GitHub Actions Failure

roji had a problem deploying to integration January 25, 2025 00:29 — with GitHub Actions Failure

Fully implement logical operators, including NOT.

20a2aea

* Also inline captured variables. * Also improve tests to check for none/all results.

roji force-pushed the LinqFiltering branch from ef07816 to 20a2aea Compare January 25, 2025 00:39

roji had a problem deploying to integration January 25, 2025 00:39 — with GitHub Actions Failure

Support Contains over inline/captured enumerables

c9e9e6b

roji had a problem deploying to integration January 25, 2025 07:58 — with GitHub Actions Failure

Implement Redis support

cfd8111

roji had a problem deploying to integration January 25, 2025 11:17 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: WIP on LINQ-based criteria filtering #10273

.Net: WIP on LINQ-based criteria filtering #10273

roji commented Jan 23, 2025 •

edited

Loading

roji Jan 23, 2025

roji Jan 23, 2025

dmytrostruk Jan 23, 2025

roji Jan 23, 2025

westey-m commented Jan 23, 2025

westey-m Jan 23, 2025

roji Jan 23, 2025

westey-m Jan 24, 2025

roji Jan 24, 2025

roji commented Jan 23, 2025

.Net: WIP on LINQ-based criteria filtering #10273

Are you sure you want to change the base?

.Net: WIP on LINQ-based criteria filtering #10273

Conversation

roji commented Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westey-m commented Jan 23, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roji commented Jan 23, 2025

roji commented Jan 23, 2025 •

edited

Loading