-
-
Notifications
You must be signed in to change notification settings - Fork 38
[Store][Postgres] allow store initialization with utilized distance #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for this PR. I'm using So in order to test, I changed in the constructor, and double check it's well configurated (current operator : diff --git a/src/store/src/Bridge/Postgres/Store.php b/src/store/src/Bridge/Postgres/Store.php
index 07bff22..28d4ba5 100644
--- a/src/store/src/Bridge/Postgres/Store.php
+++ b/src/store/src/Bridge/Postgres/Store.php
@@ -34,7 +34,7 @@ final readonly class Store implements VectorStoreInterface, InitializableStoreIn
private \PDO $connection,
private string $tableName,
private string $vectorFieldName = 'embedding',
- private Distance $distance = Distance::L2,
+ private Distance $distance = Distance::Cosine,
) {
} I have crawled https://jolicode.com and https://www.premieroctet.com, indexed all their content, and run the following code: $rows = $connection->executeQuery("select * from {$_SERVER['PLATFORM']}")->fetchAllAssociative();
foreach ($rows as $row) {
$metadata = json_decode($row['metadata'], true, 512, \JSON_THROW_ON_ERROR);
$vector = new Vector(json_decode($row['embedding'], true));
$documents = $store->query($vector, [], 0.000001); //Hack to not get "current row"
if (!$documents) {
continue;
}
echo "Current document: {$metadata['url']}\n";
echo "Found " . count($documents) . " similar documents:\n";
foreach ($documents as $i => $document) {
echo "- {$document->metadata['url']} (score: {$document->score})\n";
// break;
}
die;
echo "\n";
} So:
|
e966024
to
b4e30f5
Compare
$uuid = Uuid::v4(); | ||
$vectorData = [0.1, 0.2, 0.3]; | ||
$minScore = 0.8; | ||
$pdo = $this->createMock(\PDO::class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know the testing strategy on this project yet. But this kind of test tests nothing. All the important things are mocked.
IMHO, it would be much better to use a real instance of pgvector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Yeah ... the tests just testing the query is correctly build. But nethertheless i have to remove the tests because there was a merge of tests. But still unit and not functional tests. I would keep with them for now to have just a look that the correct comparison method is utilized.
Thanks, @lyrixx ! I've already added it to the named constructors. Your results generally is looking totally fine to me. The cosine distance, which is being used, returns a value between In the query, the filtering could be problematic. The term The score problem in filtering seems also be valid for the L2 distance as this is a value from 0 to infinite, where 0 is the most fitting match. It seems the |
b4e30f5
to
8c7735a
Compare
Thanks you very much for the explanation. Very clear. And I agree with you for the minScore. May be the name could be "treashhold". It kinda generic haha |
According to the pgvector documentation there are multiple distance calculations allowed. The current implementation in the store is only the L2 distance with the usage of
<->
. Allowing to utilize the other distance calculation variants would be useful here as mostly the discussion seem to go around the cosine algorithm.