You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* initial pass at PARTITION KEY support.
* Initial pass, allow auxiliary columns on vec0 virtual tables
* update TODO
* Initial pass at metadata filtering
* unit tests
* gha this PR branch
* fixup tests
* doc internal
* fix tests, KNN/rowids in
* define SQLITE_INDEX_CONSTRAINT_OFFSET
* whoops
* update tests, syrupy, use uv
* un ignore pyproject.toml
* dot
* tests/
* type error?
* win: .exe, update error name
* try fix macos python, paren around expr?
* win bash?
* dbg :(
* explicit error
* op
* dbg win
* win ./tests/.venv/Scripts/python.exe
* block UPDATEs on partition key values for now
* test this branch
* accidentally removved "partition key type mistmatch" block during merge
* typo ugh
* bruv
* start aux snapshots
* drop aux shadow table on destroy
* enforce column types
* block WHERE constraints on auxiliary columns in KNN queries
* support delete
* support UPDATE on auxiliary columns
* test this PR
* dont inline that
* test-metadata.py
* memzero text buffer
* stress test
* more snpashot tests
* rm double/int32, just float/int64
* finish type checking
* long text support
* DELETE support
* UPDATE support
* fix snapshot names
* drop not-used in eqp
* small fixes
* boolean comparison handling
* ensure error is raised when long string constraint
* new version string for beta builds
* typo whoops
* ann-filtering-benchmark directory
* test-case
* updates
* fix aux column error when using non-default rowid values, needs test
* refactor some text knn filtering
* rowids blob read only on text metadata filters
* refactor
* add failing test causes for non eq text knn
* text knn NE
* test cases diff
* GT
* text knn GT/GE fixes
* text knn LT/LE
* clean
* vtab_in handling
* unblock aux failures for now
* guard sqlite3_vtab_in
* else in guard?
* fixes and tests
* add broken shadow table test
* rename _metadata_chunksNN shadown table to _metadatachunksNN, for proper shadowName detection
* _metadata_text_NN shadow tables to _metadatatextNN
* SQLITE_VEC_VERSION_MAJOR SQLITE_VEC_VERSION_MINOR and SQLITE_VEC_VERSION_PATCH in sqlite-vec.h
* _info shadow table
* forgot to update aux snapshot?
* fix aux tests
Copy file name to clipboardExpand all lines: ARCHITECTURE.md
+75-7
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,51 @@
1
+
# `sqlite-vec` Architecture
2
+
3
+
Internal documentation for how `sqlite-vec` works under-the-hood. Not meant for
4
+
users of the `sqlite-vec` project, consult
5
+
[the official `sqlite-vec` documentation](https://alexgarcia.xyz/sqlite-vec) for
6
+
how-to-guides. Rather, this is for people interested in how `sqlite-vec` works
7
+
and some guidelines to any future contributors.
8
+
9
+
Very much a WIP.
10
+
1
11
## `vec0`
2
12
13
+
### Shadow Tables
14
+
15
+
#### `xyz_chunks`
16
+
17
+
-`chunk_id INTEGER`
18
+
-`size INTEGER`
19
+
-`validity BLOB`
20
+
-`rowids BLOB`
21
+
22
+
#### `xyz_rowids`
23
+
24
+
-`rowid INTEGER`
25
+
-`id`
26
+
-`chunk_id INTEGER`
27
+
-`chunk_offset INTEGER`
28
+
29
+
#### `xyz_vector_chunksNN`
30
+
31
+
-`rowid INTEGER`
32
+
-`vector BLOB`
33
+
34
+
#### `xyz_auxiliary`
35
+
36
+
-`rowid INTEGER`
37
+
-`valueNN [type]`
38
+
39
+
#### `xyz_metadatachunksNN`
40
+
41
+
-`rowid INTEGER`
42
+
-`data BLOB`
43
+
44
+
#### `xyz_metadatatextNN`
45
+
46
+
-`rowid INTEGER`
47
+
-`data TEXT`
48
+
3
49
### idxStr
4
50
5
51
The `vec0` idxStr is a string composed of single "header" character and 0 or
@@ -14,8 +60,11 @@ The "header" charcter denotes the type of query plan, as determined by the
14
60
|`VEC0_QUERY_PLAN_POINT`|`'2'`| Perform a single-lookup point query for the provided rowid |
15
61
|`VEC0_QUERY_PLAN_KNN`|`'3'`| Perform a KNN-style query on the provided query vector and parameters. |
16
62
17
-
Each 4-character "block" is associated with a corresponding value in `argv[]`. For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is associated with `argv[2]` and so on. Each block describes what kind of value or filter the given `argv[i]` value is.
18
-
63
+
Each 4-character "block" is associated with a corresponding value in `argv[]`.
64
+
For example, the 1st block at byte offset `1-4` (inclusive) is the 1st block and
65
+
is associated with `argv[1]`. The 2nd block at byte offset `5-8` (inclusive) is
66
+
associated with `argv[2]` and so on. Each block describes what kind of value or
67
+
filter the given `argv[i]` value is.
19
68
20
69
#### `VEC0_IDXSTR_KIND_KNN_MATCH` (`'{'`)
21
70
@@ -31,24 +80,43 @@ The remaining 3 characters of the block are `_` fillers.
31
80
32
81
#### `VEC0_IDXSTR_KIND_KNN_ROWID_IN` (`'['`)
33
82
34
-
`argv[i]` is the optional `rowid in (...)` value, and must be handled with[`sqlite3_vtab_in_first()` /
`argv[i]` is a "constraint" on a specific partition key.
42
91
43
-
The second character of the block denotes which partition key to filter on, using `A` to denote the first partition key column, `B` for the second, etc. It is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`.
92
+
The second character of the block denotes which partition key to filter on,
93
+
using `A` to denote the first partition key column, `B` for the second, etc. It
94
+
is encoded with `'A' + partition_idx` and can be decoded with `c - 'A'`.
44
95
45
-
The third character of the block denotes which operator is used in the constraint. It will be one of the values of `enum vec0_partition_operator`, as only a subset of operations are supported on partition keys.
96
+
The third character of the block denotes which operator is used in the
97
+
constraint. It will be one of the values of `enum vec0_partition_operator`, as
98
+
only a subset of operations are supported on partition keys.
46
99
47
100
The fourth character of the block is a `_` filler.
48
101
49
-
50
102
#### `VEC0_IDXSTR_KIND_POINT_ID` (`'!'`)
51
103
52
104
`argv[i]` is the value of the rowid or id to match against for the point query.
53
105
54
106
The remaining 3 characters of the block are `_` fillers.
0 commit comments