-
Notifications
You must be signed in to change notification settings - Fork 48
enable adding indexes to collections to speed up initial data lookups #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: e526c1e The changes in this PR will be included in the next version bump. This PR includes changesets to release 6 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@tanstack/db-example-react-todo @tanstack/db
@tanstack/electric-db-collection
@tanstack/query-db-collection
@tanstack/react-db
@tanstack/vue-db
commit: |
Size Change: +3.65 kB (+9.68%) Total Size: 41.4 kB
ℹ️ View Unchanged
|
Size Change: 0 B Total Size: 1.05 kB ℹ️ View Unchanged
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR needs some additional work to minimize the bookkeeping overhead of the index range and to improve the code quality by introducing the necessary abstractions wrt indexes.
* Represents an index for fast lookups on a collection | ||
* All indexes are ordered to support range queries | ||
*/ | ||
export interface CollectionIndex< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more appropriate for CollectionIndex
to be a class instead of a plain JS object. Right now, it's an object leaking internal data structures to whoever wants to use it. As a result, the CollectionImpl
class now contains a lot of additional complexity for building and maintaining indexes. It would be better to abstract all of that away behind a CollectionIndex
class. That way, the changes to CollectionImpl
will be minimal.
So, I'd rather have a CollectionIndex
that implements an index and is a black box (i.e. all implementation details are contained within the class and the caller doesn't need to know about them). That also simplifies code evolution since we could swap out internal data structures without requiring changes to the calling code, as long as the public interface remains the same.
Such a CollectionIndex
class would expose an add(value, key)
method which will then update its internal data structures such as valueMap
, orderedEntries
, and indexedKeys
. It would also expose methods for equality lookups and range scans.
this.validateCollectionUsable(`createIndex`) | ||
|
||
// Generate unique ID for this index | ||
const indexId = `index_${++this.indexCounter}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just the indexCounter
number?
|
||
// Check if this is a simple field comparison: field op value | ||
|
||
if (leftArg.type === `ref` && rightArg.type === `val`) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the other way around?
e.g. imagine someone wrote lte(18, user.age)
.
} | ||
|
||
// All arguments must be optimizable | ||
return expression.args.every((arg) => this.canOptimizeExpression(arg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we optimize the parts that are optimizable to already gain some speedup there?
const fieldPath = fieldArg.path | ||
|
||
// Find an index that matches this field | ||
for (const index of this.indexes.values()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this exact code also in canOptimizeSimpleComparison
. Let's extract it to a function findIndex
.
|
||
if (results.length > 0) { | ||
// Intersect all matching keys (AND logic) | ||
let intersectedKeys = results[0]!.matchingKeys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move all this set intersection logic to a utility function
const expression = toExpression(whereExpression) | ||
const evaluator = compileSingleRowExpression(expression) | ||
const result = evaluator(item as Record<string, unknown>) | ||
return Boolean(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to convert it to a boolean? Should't a WHERE clause always be a predicate (i.e. evaluate to a boolean value) ?
- Updated utils.ts to use Map instead of WeakMap for cycle detection (from main) - Kept all collection indexing utility functions - Updated optimizer.ts to use main branch patterns while preserving enhancements - Added distinct import to compiler/index.ts from main branch - Fixed non-null assertions in compiler to match main branch style - All review comments from PR #257 remain addressed
95b29d8
to
e526c1e
Compare
stacked on #256
Indexes are created like so:
the callback uses the same expression builder to extract a value to index as the query builder does for where classes.
You can add as many indexes to a collection as you like, and they can be created after the collection was created initially.
This pr does not link this up to live queries... thats in #258