-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: WIP on LINQ-based criteria filtering #10273
base: feature-vector-data-preb1
Are you sure you want to change the base?
Conversation
var right = this.Visit(andAlso.Right); | ||
|
||
// Qdrant doesn't allow arbitrary nesting of logical operators, only one MUST list (AND), one SHOULD list (OR), and one MUST_NOT list (AND NOT). | ||
// We can combine MUST and MUST_NOT; but we can only combine SHOULD if it's the *only* thing on the one side (no MUST/MUST_NOT), and there's no SHOULD on the other (only MUST/MUST_NOT). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am not 100% sure about Qdrant's capabilities here, should confirm.
@@ -20,6 +20,6 @@ public interface IVectorizableTextSearch<TRecord> | |||
/// <returns>The records found by the vector search, including their result scores.</returns> | |||
Task<VectorSearchResults<TRecord>> VectorizableTextSearchAsync( | |||
string searchText, | |||
VectorSearchOptions? options = default, | |||
VectorSearchOptions<TRecord>? options = default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making VectorSearchOptions generic over TRecord isn't trivial... Modern C# users can use target-typed new, and in that case this doesn't interfere in the method calling. But people not doing target-typed new would have to explicitly specify the full type every time (even if they don't use filtering at all), which maybe isn't ideal. The alternative would be to move the LINQ filter to an optional parameter directly on the method (no strong feelings from my side here).
dotnet/src/Connectors/VectorData.Abstractions/VectorSearch/VectorSearchOptions.cs
Show resolved
Hide resolved
where TKey : notnull | ||
{ | ||
[Fact] | ||
public virtual async Task Equality_with_int() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can keep PascalCase as naming convention for test methods like in other tests in a codebase - that would be great :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly anticipating all these moving out to the Microsoft.Extensions repo, where IIRC the convention is like this. But I can rename to match SK conventions and change later if you prefer.
dotnet/src/VectorDataTests/QdrantTests/Support/TestContainer/QdrantBuilder.cs
Outdated
Show resolved
Hide resolved
dotnet/src/VectorDataTests/QdrantTests/Support/TestContainer/QdrantConfiguration.cs
Outdated
Show resolved
Hide resolved
dotnet/src/VectorDataTests/QdrantTests/Support/TestContainer/QdrantContainer.cs
Outdated
Show resolved
Hide resolved
@roji, a good scenario to also include in early exploration would be tag based operations, e.g. where there is an array of strings (tags) in a field in the database, and we want to check if one or more out of tags we provide match those. |
e.Int2 == a.Int2); | ||
} | ||
|
||
[Fact] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the issues I have been facing with abstract base class tests is that a few of the DBs don't perform well on the Build server for integration tests, so we typically run them manually. Mostly issues with failed startup and flakiness.
I generalized some work done for Pinecone that allows us to have attribute based test disabling, which allows us to have an attribute on the inheriting class that disables all tests in the inheriting and base class.
Check out VectorStoreFact
, PineconeApiKeySetConditionAttribute
and DisableVectorStoreTestsAttribute
for how this works.
Passing a Skip reason to Fact doesn't work because it requires a static reason, and it would just disable the tests for all inheritors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I remember this... I'll take a look.
But just off the top of my head, in CI we could run tests only for those providers where there's a good container-baesd option (like Qdrant), and not run them for the others (splitting the test projects by provider could also help with that a bit). Locally, developers can simply explicitly choose what it is they want to run - though that would mean that running all tests in the solution would produce failures. If that's not acceptable, a conditional [Fact] attribute like what you mention above could be good, based on some environment variable or whatever.
In any case, my hope is that this could be orthogonal to the tests themselves being general across providers, as opposed to written per-provider... But we'll see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like the idea of the test conditionality being based on whether the required secrets have been configured like in PineconeApiKeySetConditionAttribute, since it removes the need to change build settings in addition to setting the build secrets. Locally, I prefer skipped to failure, since failures need to be investigated, while the meaning of something being skipped is clear without further investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will do that.
@westey-m thanks - will do that, it should be easily representible in LINQ by doing |
* Also inline captured variables. * Also improve tests to check for none/all results.
This is some draft work on LINQ-based criteria filtering (#10156). Basic equality, AND and OR are implemented for Qdrant, and tested via a specification test suite that should be usable more or less as-is across all other providers (#10194). There's a lot more work here and many possible improvements - am submitting in early draft stage to gather early feedback.
To see what this looks like, see these tests.
Note that this adds new projects outside of the existing "IntegrationTests" project; it's difficult to work when all provider tests are in the same project (any experimental breaking change in the abstraction must be propagated everywhere), and in any case, I think we want to end up in a place where each provider has its own separate test project.
/cc @westey-m